[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441684#comment-16441684 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-382195795 Hi @laurentgo Please hold off until I push some additional code changes based on the test cases. You can continue the code review once that is done. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436765#comment-16436765 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-381014810 Hi @laurentgo , I have taken care of your following review comments - 1. Used BaseAllocator instead of RootAllocator 2. Added Calendar object as another argument to the API so that it gets used for Date, Time and Timestamp data values 3. Handled NULL values for all the data types. I am basically using Nullable DataHolder objects as much as possible. Test cases need to be added though. 4. For BigDecimal data type used getLong() API instead of getInt(). 5. Used StandardCharsets.UTF_8 as charset at places where we are doing string to bytes operation. Things that are still not done - 1. I am unable to use the streaming approach for Blob and Clob as I couldn't figure out a way to really populate the Destination ArrowBuffer in a streaming manner. 2. I still need to take care of the precision of Timestamp values for Nano/Micro/Milli values. 3. Array data type is not yet supported. 4. For the "default" switch case, the control shouldn't get there. So I could throw Exception if that makes sense. Let me know your comments. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434383#comment-16434383 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180855624 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/AbstractJdbcToArrowTest.java ## @@ -0,0 +1,66 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.sql.Connection; +import java.sql.Statement; + +/** + * Class to abstract out some common test functionality for testing JDBC to Arrow. + */ +public abstract class AbstractJdbcToArrowTest { + +protected void createTestData(Connection conn, Table table) throws Exception { + +Statement stmt = null; +try { +//create the table and insert the data and once done drop the table +stmt = conn.createStatement(); +stmt.executeUpdate(table.getCreate()); + +for (String insert: table.getData()) { +stmt.executeUpdate(insert); +} + +} catch (Exception e) { +e.printStackTrace(); +} finally { Review comment: Thanks @laurentgo for the comments. I should be able to revert soon with further changes. Still getting some work done from our India development team member on the test cases related changes. Let me ping you on Slack for any quick discussion. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434204#comment-16434204 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-380521328 @atuldambalkar I can be reached on slack if you need me This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434167#comment-16434167 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180815237 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,431 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + + +import com.google.common.base.Preconditions; +import org.apache.arrow.vector.BaseFixedWidthVector; +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.math.BigDecimal; + +import java.nio.charset.StandardCharsets; +import java.sql.Blob; +import java.sql.Clob; +import java.sql.Date; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Time; +import java.sql.Timestamp; +import java.sql.Types; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434161#comment-16434161 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180253135 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -200,144 +226,206 @@ public static void jdbcToArrowVectors(ResultSet rs, VectorSchemaRoot root) throw switch (rsmd.getColumnType(i)) { case Types.BOOLEAN: case Types.BIT: -BitVector bitVector = (BitVector) root.getVector(columnName); -bitVector.setSafe(rowCount, rs.getBoolean(i)? 1: 0); -bitVector.setValueCount(rowCount + 1); +updateVector((BitVector)root.getVector(columnName), +rs.getBoolean(i), rowCount); break; case Types.TINYINT: -TinyIntVector tinyIntVector = (TinyIntVector)root.getVector(columnName); -tinyIntVector.setSafe(rowCount, rs.getInt(i)); -tinyIntVector.setValueCount(rowCount + 1); +updateVector((TinyIntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.SMALLINT: -SmallIntVector smallIntVector = (SmallIntVector)root.getVector(columnName); -smallIntVector.setSafe(rowCount, rs.getInt(i)); -smallIntVector.setValueCount(rowCount + 1); + updateVector((SmallIntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.INTEGER: -IntVector intVector = (IntVector)root.getVector(columnName); -intVector.setSafe(rowCount, rs.getInt(i)); -intVector.setValueCount(rowCount + 1); +updateVector((IntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.BIGINT: -BigIntVector bigIntVector = (BigIntVector)root.getVector(columnName); -bigIntVector.setSafe(rowCount, rs.getInt(i)); -bigIntVector.setValueCount(rowCount + 1); +updateVector((BigIntVector)root.getVector(columnName), +rs.getInt(i), rowCount); Review comment: if bigint is a 64bits integer, it should probably use rs.getLong() (maybe have unit tests with large values, both positive and negative?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434159#comment-16434159 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180811074 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,95 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.arrow +arrow-java-root +0.10.0-SNAPSHOT + + +arrow-jdbc +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +${project.version} + + + +org.apache.arrow +arrow-vector +${project.version} + + +com.google.guava +guava +18.0 + + + + + +junit +junit +4.11 +test + + + +com.h2database +h2 +1.4.196 +test + + +com.fasterxml.jackson.dataformat +jackson-dataformat-yaml +2.7.9 +test + + +com.fasterxml.jackson.core +jackson-databind +2.7.9 +test + + + +com.google.collections Review comment: That seems like a legacy library, before Guava was created... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434157#comment-16434157 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180249387 ## File path: java/adapter/jdbc/pom.xml ## @@ -62,10 +68,11 @@ 2.7.9 test + -com.google.guava -guava -18.0 +com.google.collections Review comment: isn't that deprecated in favor of guava? (last update is 2009...) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434163#comment-16434163 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180811190 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,95 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.arrow +arrow-java-root +0.10.0-SNAPSHOT + + +arrow-jdbc +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +${project.version} + + + +org.apache.arrow +arrow-vector +${project.version} + + +com.google.guava +guava +18.0 + + + + + +junit +junit +4.11 Review comment: replace with ${dep.junit.version} This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434169#comment-16434169 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180818053 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,431 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + + +import com.google.common.base.Preconditions; +import org.apache.arrow.vector.BaseFixedWidthVector; +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.math.BigDecimal; + +import java.nio.charset.StandardCharsets; +import java.sql.Blob; +import java.sql.Clob; +import java.sql.Date; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Time; +import java.sql.Timestamp; +import java.sql.Types; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434155#comment-16434155 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180249672 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -64,53 +68,48 @@ * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed * the connection object it's the responsibility of the caller to close or return the connection to the pool. * @param query The DB Query to fetch the data. - * @return - * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + * @return Arrow Data Objects {@link VectorSchemaRoot} + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statement objects. */ -public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { - -assert connection != null: "JDBC conncetion object can not be null"; -assert query != null && query.length() > 0: "SQL query can not be null or empty"; - -RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); +public static VectorSchemaRoot sqlToArrow(Connection connection, String query, RootAllocator rootAllocator) throws SQLException { +Preconditions.checkNotNull(connection, "JDBC connection object can not be null"); +Preconditions.checkArgument(query != null && query.length() > 0, "SQL query can not be null or empty"); -Statement stmt = null; -ResultSet rs = null; -try { -stmt = connection.createStatement(); -rs = stmt.executeQuery(query); -ResultSetMetaData rsmd = rs.getMetaData(); -VectorSchemaRoot root = VectorSchemaRoot.create( -JdbcToArrowUtils.jdbcToArrowSchema(rsmd), rootAllocator); -JdbcToArrowUtils.jdbcToArrowVectors(rs, root); -return root; -} catch (Exception exc) { -// just throw it out after logging -throw exc; -} finally { -if (rs != null) { -rs.close(); -} -if (stmt != null) { -stmt.close(); // test -} +try (Statement stmt = connection.createStatement()) { +return sqlToArrow(stmt.executeQuery(query), rootAllocator); } } /** - * This method returns ArrowDataFetcher Object that can be used to fetch and iterate on the data in the given - * database table. - * - * @param connection - Database connection Object - * @param tableName - Table name from which records will be fetched + * For the given JDBC {@link ResultSet}, fetch the data from Relational DB and convert it to Arrow objects. * - * @return ArrowDataFetcher - Instance of ArrowDataFetcher which can be used to get Arrow Vector obejcts by calling its functionality + * @param resultSet + * @return Arrow Data Objects {@link VectorSchemaRoot} + * @throws Exception */ -public static ArrowDataFetcher jdbcArrowDataFetcher(Connection connection, String tableName) { -assert connection != null: "JDBC conncetion object can not be null"; -assert tableName != null && tableName.length() > 0: "Table name can not be null or empty"; +public static VectorSchemaRoot sqlToArrow(ResultSet resultSet) throws SQLException { +Preconditions.checkNotNull(resultSet, "JDBC ResultSet object can not be null"); -return new ArrowDataFetcher(connection, tableName); +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); +VectorSchemaRoot root = sqlToArrow(resultSet, rootAllocator); +rootAllocator.close(); +return root; } +/** + * For the given JDBC {@link ResultSet}, fetch the data from Relational DB and convert it to Arrow objects. + * + * @param resultSet + * @return Arrow Data Objects {@link VectorSchemaRoot} + * @throws Exception + */ +public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, RootAllocator rootAllocator) throws SQLException { Review comment: I know I mentioned RootAllocator, but I guess BufferAllocator (which is the base interface) would work as well? This is an automated message from the Apache Git Service. To respond to the message, please log on
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434168#comment-16434168 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180817457 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,431 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + + +import com.google.common.base.Preconditions; +import org.apache.arrow.vector.BaseFixedWidthVector; +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.math.BigDecimal; + +import java.nio.charset.StandardCharsets; +import java.sql.Blob; +import java.sql.Clob; +import java.sql.Date; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Time; +import java.sql.Timestamp; +import java.sql.Types; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434164#comment-16434164 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180818360 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/AbstractJdbcToArrowTest.java ## @@ -0,0 +1,66 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.sql.Connection; +import java.sql.Statement; + +/** + * Class to abstract out some common test functionality for testing JDBC to Arrow. + */ +public abstract class AbstractJdbcToArrowTest { + +protected void createTestData(Connection conn, Table table) throws Exception { + +Statement stmt = null; +try { +//create the table and insert the data and once done drop the table +stmt = conn.createStatement(); +stmt.executeUpdate(table.getCreate()); + +for (String insert: table.getData()) { +stmt.executeUpdate(insert); +} + +} catch (Exception e) { +e.printStackTrace(); +} finally { Review comment: you should use `try(with-resources)` construct instead... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434166#comment-16434166 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180810834 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,95 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.arrow +arrow-java-root +0.10.0-SNAPSHOT + + +arrow-jdbc +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +${project.version} + + + +org.apache.arrow +arrow-vector +${project.version} + + +com.google.guava +guava +18.0 + + + + + +junit +junit +4.11 +test + + + +com.h2database +h2 +1.4.196 +test + + +com.fasterxml.jackson.dataformat +jackson-dataformat-yaml +2.7.9 +test + + +com.fasterxml.jackson.core +jackson-databind +2.7.9 Review comment: replace with ${dep.jackson.version} This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434156#comment-16434156 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180252798 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434160#comment-16434160 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180810807 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,95 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.arrow +arrow-java-root +0.10.0-SNAPSHOT + + +arrow-jdbc +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +${project.version} + + + +org.apache.arrow +arrow-vector +${project.version} + + +com.google.guava +guava +18.0 + + + + + +junit +junit +4.11 +test + + + +com.h2database +h2 +1.4.196 +test + + +com.fasterxml.jackson.dataformat +jackson-dataformat-yaml +2.7.9 Review comment: replace with ${dep.jackson.version} This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434165#comment-16434165 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180815032 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,431 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + + +import com.google.common.base.Preconditions; +import org.apache.arrow.vector.BaseFixedWidthVector; +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.math.BigDecimal; + +import java.nio.charset.StandardCharsets; +import java.sql.Blob; +import java.sql.Clob; +import java.sql.Date; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Time; +import java.sql.Timestamp; +import java.sql.Types; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434158#comment-16434158 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180810672 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,95 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.arrow +arrow-java-root +0.10.0-SNAPSHOT + + +arrow-jdbc +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +${project.version} + + + +org.apache.arrow +arrow-vector +${project.version} + + +com.google.guava +guava +18.0 Review comment: replace with ${dep.guava.version} This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434162#comment-16434162 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180253328 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -200,144 +226,206 @@ public static void jdbcToArrowVectors(ResultSet rs, VectorSchemaRoot root) throw switch (rsmd.getColumnType(i)) { case Types.BOOLEAN: case Types.BIT: -BitVector bitVector = (BitVector) root.getVector(columnName); -bitVector.setSafe(rowCount, rs.getBoolean(i)? 1: 0); -bitVector.setValueCount(rowCount + 1); +updateVector((BitVector)root.getVector(columnName), +rs.getBoolean(i), rowCount); break; case Types.TINYINT: -TinyIntVector tinyIntVector = (TinyIntVector)root.getVector(columnName); -tinyIntVector.setSafe(rowCount, rs.getInt(i)); -tinyIntVector.setValueCount(rowCount + 1); +updateVector((TinyIntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.SMALLINT: -SmallIntVector smallIntVector = (SmallIntVector)root.getVector(columnName); -smallIntVector.setSafe(rowCount, rs.getInt(i)); -smallIntVector.setValueCount(rowCount + 1); + updateVector((SmallIntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.INTEGER: -IntVector intVector = (IntVector)root.getVector(columnName); -intVector.setSafe(rowCount, rs.getInt(i)); -intVector.setValueCount(rowCount + 1); +updateVector((IntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.BIGINT: -BigIntVector bigIntVector = (BigIntVector)root.getVector(columnName); -bigIntVector.setSafe(rowCount, rs.getInt(i)); -bigIntVector.setValueCount(rowCount + 1); +updateVector((BigIntVector)root.getVector(columnName), +rs.getInt(i), rowCount); break; case Types.NUMERIC: case Types.DECIMAL: -DecimalVector decimalVector = (DecimalVector)root.getVector(columnName); -decimalVector.setSafe(rowCount, rs.getBigDecimal(i)); -decimalVector.setValueCount(rowCount + 1); +updateVector((DecimalVector)root.getVector(columnName), +rs.getBigDecimal(i), rowCount); break; case Types.REAL: case Types.FLOAT: -Float4Vector float4Vector = (Float4Vector)root.getVector(columnName); -float4Vector.setSafe(rowCount, rs.getFloat(i)); -float4Vector.setValueCount(rowCount + 1); +updateVector((Float4Vector)root.getVector(columnName), +rs.getFloat(i), rowCount); break; case Types.DOUBLE: -Float8Vector float8Vector = (Float8Vector)root.getVector(columnName); -float8Vector.setSafe(rowCount, rs.getDouble(i)); -float8Vector.setValueCount(rowCount + 1); +updateVector((Float8Vector)root.getVector(columnName), +rs.getDouble(i), rowCount); break; case Types.CHAR: case Types.NCHAR: case Types.VARCHAR: case Types.NVARCHAR: case Types.LONGVARCHAR: case Types.LONGNVARCHAR: -VarCharVector varcharVector = (VarCharVector)root.getVector(columnName); -String value = rs.getString(i) != null ? rs.getString(i) : ""; -varcharVector.setIndexDefined(rowCount); -
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431103#comment-16431103 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180205035 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431015#comment-16431015 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r180185358 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428685#comment-16428685 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-379330225 @donderom I recently did that change based on some earlier comments from @laurentgo I have added that as another interface. So we are good! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428410#comment-16428410 ] ASF GitHub Bot commented on ARROW-1780: --- donderom commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-379276243 As I understand the idea is to convert `java.sql.ResultSet` to Arrow. The result set can be provided by 3-party lib what will make `sqlToArrow(Connection connection, String query)` API not usable. What about something like `sqlToArrow(ResultSet resultSet)`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424924#comment-16424924 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179015462 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); + +Statement stmt = null; Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424921#comment-16424921 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179015368 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { +this.connection = connection; +this.tableName = tableName; +} + +/** + * Fetch the data from underlying table with the given limit and offset and for passed column names. + * + * @param offset + * @param limit + * @param columns + * @return + * @throws Exception + */ +public VectorSchemaRoot fetch(int offset, int limit, String... columns) throws Exception { +assert columns != null && columns.length > 0 : "columns can't be empty!"; Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424918#comment-16424918 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179015325 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; Review comment: Fixed. Also changed this to use Preconditions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - >
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424917#comment-16424917 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179015269 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/Table.java ## @@ -0,0 +1,74 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +/** + * Review comment: Added relevant doc comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424903#comment-16424903 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-378461698 Hi @laurentgo and @siddharthteotia I am still working on some of the code review changes. I have checked-in some code fixes. I will let you know once I am done with all the changes or if I need to have some discussion with you. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424872#comment-16424872 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009875 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424870#comment-16424870 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009787 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424868#comment-16424868 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009692 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424865#comment-16424865 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009588 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424862#comment-16424862 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009456 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424861#comment-16424861 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009426 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424853#comment-16424853 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r179009203 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/Table.java ## @@ -0,0 +1,74 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +/** + * Review comment: Will add a necessary comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424685#comment-16424685 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178977178 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423151#comment-16423151 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178652817 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423149#comment-16423149 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178652242 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { +this.connection = connection; +this.tableName = tableName; +} + +/** + * Fetch the data from underlying table with the given limit and offset and for passed column names. + * + * @param offset + * @param limit + * @param columns + * @return + * @throws Exception + */ +public VectorSchemaRoot fetch(int offset, int limit, String... columns) throws Exception { +assert columns != null && columns.length > 0 : "columns can't be empty!"; Review comment: Yea, I here you! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423141#comment-16423141 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178651658 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423125#comment-16423125 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178648346 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { Review comment: That's my belief too... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423062#comment-16423062 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178636322 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { +this.connection = connection; +this.tableName = tableName; +} + +/** + * Fetch the data from underlying table with the given limit and offset and for passed column names. + * + * @param offset + * @param limit + * @param columns + * @return + * @throws Exception + */ +public VectorSchemaRoot fetch(int offset, int limit, String... columns) throws Exception { +assert columns != null && columns.length > 0 : "columns can't be empty!"; Review comment: Apart from the semantic difference (assertion are usually internal precondition of how the class behave, to help debugging), asserts are only turned on if enabled at the JDK level (using the -ea flag). Whereas preconditions are always checked. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423063#comment-16423063 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178636342 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { Review comment: Aah, okay, that makes sense. I think in that case, we don't even need to worry about various databases and also testing against each of those. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423060#comment-16423060 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178635982 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); Review comment: If there's no way to automatically free buffers/close the allocator, you probably want to modify the existing function to take one as an input. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423057#comment-16423057 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178635650 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { Review comment: My suggestion is to let the user do the query thing (get a connection, create the statement and execute it), and use the resulting {{ResultSet}} to do the Arrow conversation (and hopefully no need to deal with different dialects and other stuff) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423035#comment-16423035 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178629345 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); Review comment: Do you think it would be good to provide another overloaded API with RootAlocator as argument - public static VectorSchemaRoot sqlToArrow(Connection connection, String query, RootAllocator) or should I just modify the existing one? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423029#comment-16423029 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178628383 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { +this.connection = connection; +this.tableName = tableName; +} + +/** + * Fetch the data from underlying table with the given limit and offset and for passed column names. + * + * @param offset + * @param limit + * @param columns + * @return + * @throws Exception + */ +public VectorSchemaRoot fetch(int offset, int limit, String... columns) throws Exception { +assert columns != null && columns.length > 0 : "columns can't be empty!"; Review comment: Do you want me to use any particular Preconditions API such Guava's com.google.common.base.Preconditions? As "assert" is doing pretty much the same thing except it throws AssertionError as opposed to IllegalArgumentException. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423021#comment-16423021 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178626181 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; Review comment: Yes, this is true, my bad! I was aware of this and should have thought through before implementing this. What I am thinking here now is to come up with a Java enum for all the databases and maintain a map or constant string with (limit/offset) query specific to each database. This way I can support - ORACLE_12C,MYSQL, DB2,SQL_SERVER_2012,SQL_SERVER_2008, POSTGRESQL, H2, SQLDB, INGRES, DERBY, SQLITE, CUBRID, SYBASE_ASE, SYBASE_SQL_ANYWHERE,FIREBIRD. But the only problem here is in writing test cases. What do you think about this approach and if this is okay, how we can go about testing the code for each database? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423019#comment-16423019 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178626181 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; Review comment: Yes, this is true. I was aware of this and should have thought through before implementing this. What I am thinking here now is to come up with a Java enum for all the databases and maintain a map or constant string with (limit/offset) query specific to each database. This way I can support - ORACLE_12C,MYSQL, DB2,SQL_SERVER_2012,SQL_SERVER_2008, POSTGRESQL, H2, SQLDB, INGRES, DERBY, SQLITE, CUBRID, SYBASE_ASE, SYBASE_SQL_ANYWHERE,FIREBIRD. But the only problem here is in writing test cases. What do you think about this approach and if this is okay, how we can go about testing the code for each database? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423015#comment-16423015 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r178625435 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { Review comment: @laurentgo Can you elaborate it a bit - what do you mean by wrapping a ResultSet? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418166#comment-16418166 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-377045969 Hi @laurentgo, now I do have a handful of review comments to work on. As I work on each one of those, some may need short discussion with you. I hope that's okay. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416450#comment-16416450 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-376706778 Thanks @laurentgo and @siddharthteotia for all the review comments so far. Let me work on this and revert with my comments as I start working on the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416399#comment-16416399 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177591754 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416400#comment-16416400 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177592922 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object Review comment: typo: teh -> the This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416404#comment-16416404 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177590101 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416397#comment-16416397 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177591421 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416403#comment-16416403 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177593622 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object + * + */ +public class JdbcToArrowTestHelper { + +public static boolean assertIntVectorValues(FieldVector fx, int rowCount, int[] values) { +IntVector intVector = ((IntVector) fx); + +assertEquals(rowCount, intVector.getValueCount()); + +for(int j = 0; j < intVector.getValueCount(); j++) { +if(!intVector.isNull(j)) { +assertEquals(values[j], intVector.get(j)); +} +} +return true; Review comment: not sure what the boolean return value is for, since it's always `true` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416407#comment-16416407 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177597188 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/ArrowDataFetcherTest.java ## @@ -0,0 +1,139 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.*; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; + +import static org.junit.Assert.*; + +/** + * Test class for {@link ArrowDataFetcher}. + */ +public class ArrowDataFetcherTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:ArrowDataFetcherTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); + +conn = DriverManager.getConnection(url); +} + +@After +public void destroy() throws Exception { +if (conn != null) { +conn.close(); +conn = null; +} +} + +@Test +public void commaSeparatedQueryColumnsTest() { +try { +ArrowDataFetcher.commaSeparatedQueryColumns(null); +} catch (AssertionError error) { +assertTrue(true); +} +assertEquals(" one ", ArrowDataFetcher.commaSeparatedQueryColumns("one")); +assertEquals(" one, two ", ArrowDataFetcher.commaSeparatedQueryColumns("one", "two")); +assertEquals(" one, two, three ", ArrowDataFetcher.commaSeparatedQueryColumns("one", "two", "three")); +} + +@Test +public void arrowFetcherAllColumnsLimitOffsetTest() throws Exception { + +Table table = +mapper.readValue( + this.getClass().getClassLoader().getResourceAsStream("h2/test1_int_h2.yml"), +Table.class); + +try { +createTestData(conn, table); + +ArrowDataFetcher arrowDataFetcher = JdbcToArrow.jdbcArrowDataFetcher(conn, "table1"); + +VectorSchemaRoot root = arrowDataFetcher.fetch(0, 10); + +int[] values = { +101, 101, 101, 101, 101, 101, 101, 101, 101, 101 +}; + JdbcToArrowTestHelper.assertIntVectorValues(root.getVector("INT_FIELD1"), 10, values); + +root = arrowDataFetcher.fetch(5, 5); + + JdbcToArrowTestHelper.assertIntVectorValues(root.getVector("INT_FIELD1"), 5, values); + +} catch (Exception e) { +e.printStackTrace(); +} finally { +deleteTestData(conn, table); Review comment: since the connection is closed, that should trigger the in-memory db to be cleaned up (and not requiring a drop table...) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416417#comment-16416417 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177592821 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416416#comment-16416416 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177596733 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/ArrowDataFetcherTest.java ## @@ -0,0 +1,139 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.*; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; + +import static org.junit.Assert.*; + +/** + * Test class for {@link ArrowDataFetcher}. + */ +public class ArrowDataFetcherTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:ArrowDataFetcherTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); + +conn = DriverManager.getConnection(url); +} + +@After +public void destroy() throws Exception { +if (conn != null) { +conn.close(); +conn = null; +} +} + +@Test +public void commaSeparatedQueryColumnsTest() { +try { +ArrowDataFetcher.commaSeparatedQueryColumns(null); +} catch (AssertionError error) { +assertTrue(true); Review comment: why this check? (it seems that basically it ignores the fact that the statement throws AssertionError, but it's not checking that it always fails either...) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416405#comment-16416405 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177591258 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416421#comment-16416421 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177591823 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416396#comment-16416396 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177589268 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; Review comment: java good practices is to not use a wildcard import (at least not for static imports). The IDE should be able to expand automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416408#comment-16416408 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177592278 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416393#comment-16416393 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177589848 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416415#comment-16416415 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177595960 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowTest.java ## @@ -0,0 +1,325 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.AbstractJdbcToArrowTest; +import org.apache.arrow.adapter.jdbc.JdbcToArrow; +import org.apache.arrow.adapter.jdbc.Table; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.math.BigDecimal; +import java.sql.Connection; +import java.sql.DriverManager; +import java.util.Properties; + +import static org.apache.arrow.adapter.jdbc.JdbcToArrowTestHelper.*; + +/** + * + */ +public class JdbcToArrowTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:JdbcToArrowTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); + +conn = DriverManager.getConnection(url); +} + +@After +public void destroy() throws Exception { +if (conn != null) { +conn.close(); +conn = null; +} +} + +@Test +public void sqlToArrowTestInt() throws Exception { Review comment: suggestion for all the test comparing results... - put the type/data in the yaml file (instead of SQL statements) - generate the h2 schema + load data automatically into the db - refactor the test class to use `@Parameterized` with the list of YAML files to load This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416388#comment-16416388 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177588025 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; Review comment: that's assuming that this is supported by all SQL dialects (which is not the case). Also convention for constants is to use `SNAKE_CASE` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416414#comment-16416414 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177593055 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416391#comment-16416391 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177589483 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416409#comment-16416409 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177588989 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); + +Statement stmt = null; +ResultSet rs = null; +try { +stmt = connection.createStatement(); +rs = stmt.executeQuery(query); +ResultSetMetaData rsmd = rs.getMetaData(); +VectorSchemaRoot root = VectorSchemaRoot.create( +JdbcToArrowUtils.jdbcToArrowSchema(rsmd), rootAllocator); +JdbcToArrowUtils.jdbcToArrowVectors(rs, root); +return root; +} catch (Exception exc) { +// just throw it out after logging +throw exc; +} finally { +if (rs != null) { +rs.close(); +} +if (stmt != null) { +stmt.close(); // test +} +} +} + +/** + * This method returns ArrowDataFetcher Object that can be used to fetch and iterate on the data in the given + * database table. + * + * @param connection - Database connection Object + * @param tableName - Table name from which records will be fetched + * + * @return ArrowDataFetcher -
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416402#comment-16416402 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177595348 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/ArrowDataFetcherTest.java ## @@ -0,0 +1,139 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.*; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; + +import static org.junit.Assert.*; + +/** + * Test class for {@link ArrowDataFetcher}. + */ +public class ArrowDataFetcherTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:ArrowDataFetcherTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); + +conn = DriverManager.getConnection(url); +} + +@After +public void destroy() throws Exception { +if (conn != null) { +conn.close(); +conn = null; +} +} + +@Test +public void commaSeparatedQueryColumnsTest() { Review comment: suggestion for all the test comparing results... - put the data in the yaml file (instead of SQL statements) - generate the h2 schema + load data automatically into the db - refactor the test class to use `@Parameterized` with the list of YAML files to load This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416411#comment-16416411 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177594750 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/ArrowDataFetcherTest.java ## @@ -0,0 +1,139 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.*; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; + +import static org.junit.Assert.*; + +/** + * Test class for {@link ArrowDataFetcher}. + */ +public class ArrowDataFetcherTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:ArrowDataFetcherTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); Review comment: not sure that's necessary (h2 should have the right service metadata file) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416394#comment-16416394 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177588568 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); Review comment: Shouldn't the allocate be provided, so that the caller has control over it? as of now, it cannot be closed once you're done with the buffers... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416389#comment-16416389 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177587817 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { Review comment: Why not wrapping a ResultSet instead? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416387#comment-16416387 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177588356 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { +this.connection = connection; +this.tableName = tableName; +} + +/** + * Fetch the data from underlying table with the given limit and offset and for passed column names. + * + * @param offset + * @param limit + * @param columns + * @return + * @throws Exception + */ +public VectorSchemaRoot fetch(int offset, int limit, String... columns) throws Exception { +assert columns != null && columns.length > 0 : "columns can't be empty!"; Review comment: shouldn't `Preconditions` be used instead? it's not an internal assertion but an API contract, isn't it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416386#comment-16416386 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177587483 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowDataFetcher.java ## @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.Connection; + +/** + * Class to fetch data from a given database table where user can specify columns to fetch + * along with limit and offset parameters. + * + * The object of this class is returned by invoking method jdbcArrowDataFetcher(Connection connection, String tableName) + * from {@link JdbcToArrow} class. Caller can use this object to fetch data repetitively based on the + * data fetch requirement and can implement pagination like functionality. + * + * This class doesn't hold any open connections to database but simply executes the "select" query everytime with + * the necessary limit and offset parameters. + * + * @since 0.10.0 + * @see JdbcToArrow + */ +public class ArrowDataFetcher { + +private static final String all_columns_query = "select * from %s limit %d offset %d"; +private static final String custom_columns_query = "select %s from %s limit %d offset %d"; +private Connection connection; +private String tableName; + +/** + * Constructor + * @param connection + * @param tableName + */ +public ArrowDataFetcher(Connection connection, String tableName) { +this.connection = connection; +this.tableName = tableName; +} + +/** + * Fetch the data from underlying table with the given limit and offset and for passed column names. + * + * @param offset + * @param limit + * @param columns + * @return + * @throws Exception + */ +public VectorSchemaRoot fetch(int offset, int limit, String... columns) throws Exception { Review comment: Exception is very broad, should we try to be more specific? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416410#comment-16416410 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177594176 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object + * + */ +public class JdbcToArrowTestHelper { + +public static boolean assertIntVectorValues(FieldVector fx, int rowCount, int[] values) { +IntVector intVector = ((IntVector) fx); Review comment: (style) superfluous parenthesis. (Also you might want to manage the cast by the caller, not the callee?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416419#comment-16416419 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177594008 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object + * + */ +public class JdbcToArrowTestHelper { + +public static boolean assertIntVectorValues(FieldVector fx, int rowCount, int[] values) { +IntVector intVector = ((IntVector) fx); + +assertEquals(rowCount, intVector.getValueCount()); + +for(int j = 0; j < intVector.getValueCount(); j++) { Review comment: what if `values.length` doesn't match `rowCount`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416420#comment-16416420 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177595647 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/ArrowDataFetcherTest.java ## @@ -0,0 +1,139 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.*; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; + +import static org.junit.Assert.*; + +/** + * Test class for {@link ArrowDataFetcher}. + */ +public class ArrowDataFetcherTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:ArrowDataFetcherTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); + +conn = DriverManager.getConnection(url); +} + +@After +public void destroy() throws Exception { +if (conn != null) { +conn.close(); +conn = null; +} +} + +@Test +public void commaSeparatedQueryColumnsTest() { +try { +ArrowDataFetcher.commaSeparatedQueryColumns(null); +} catch (AssertionError error) { +assertTrue(true); +} +assertEquals(" one ", ArrowDataFetcher.commaSeparatedQueryColumns("one")); +assertEquals(" one, two ", ArrowDataFetcher.commaSeparatedQueryColumns("one", "two")); +assertEquals(" one, two, three ", ArrowDataFetcher.commaSeparatedQueryColumns("one", "two", "three")); +} + +@Test +public void arrowFetcherAllColumnsLimitOffsetTest() throws Exception { + +Table table = +mapper.readValue( + this.getClass().getClassLoader().getResourceAsStream("h2/test1_int_h2.yml"), +Table.class); + +try { +createTestData(conn, table); + +ArrowDataFetcher arrowDataFetcher = JdbcToArrow.jdbcArrowDataFetcher(conn, "table1"); + +VectorSchemaRoot root = arrowDataFetcher.fetch(0, 10); + +int[] values = { +101, 101, 101, 101, 101, 101, 101, 101, 101, 101 +}; + JdbcToArrowTestHelper.assertIntVectorValues(root.getVector("INT_FIELD1"), 10, values); + +root = arrowDataFetcher.fetch(5, 5); + + JdbcToArrowTestHelper.assertIntVectorValues(root.getVector("INT_FIELD1"), 5, values); + +} catch (Exception e) { +e.printStackTrace(); Review comment: this test will not error out in case of exception... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416395#comment-16416395 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177589044 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); + +Statement stmt = null; +ResultSet rs = null; +try { +stmt = connection.createStatement(); +rs = stmt.executeQuery(query); +ResultSetMetaData rsmd = rs.getMetaData(); +VectorSchemaRoot root = VectorSchemaRoot.create( +JdbcToArrowUtils.jdbcToArrowSchema(rsmd), rootAllocator); +JdbcToArrowUtils.jdbcToArrowVectors(rs, root); +return root; +} catch (Exception exc) { +// just throw it out after logging +throw exc; +} finally { +if (rs != null) { +rs.close(); +} +if (stmt != null) { +stmt.close(); // test +} +} +} + +/** + * This method returns ArrowDataFetcher Object that can be used to fetch and iterate on the data in the given Review comment: Object -> object (same in the argument description) This is an automated message from the Apache Git Service. To respond
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416413#comment-16416413 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177592633 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416412#comment-16416412 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177593417 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object + * + */ +public class JdbcToArrowTestHelper { + +public static boolean assertIntVectorValues(FieldVector fx, int rowCount, int[] values) { +IntVector intVector = ((IntVector) fx); + +assertEquals(rowCount, intVector.getValueCount()); + +for(int j = 0; j < intVector.getValueCount(); j++) { +if(!intVector.isNull(j)) { +assertEquals(values[j], intVector.get(j)); +} +} +return true; +} + +public static boolean assertBitBooleanVectorValues(FieldVector fx, int rowCount, int[] values){ +BitVector bitVector = ((BitVector)fx); +assertEquals(rowCount, bitVector.getValueCount()); +for(int j = 0; j < bitVector.getValueCount(); j++){ +if(!bitVector.isNull(j)) { +assertEquals(values[j], bitVector.get(j)); +} +} +return true; +} + +public static boolean assertTinyIntVectorValues(FieldVector fx, int rowCount, int[] values){ +TinyIntVector tinyIntVector = ((TinyIntVector)fx); + +assertEquals(rowCount, tinyIntVector.getValueCount()); + +for(int j = 0; j < tinyIntVector.getValueCount(); j++){ +if(!tinyIntVector.isNull(j)) { +assertEquals(values[j], tinyIntVector.get(j)); +} +} +return true; +} + +public static boolean assertSmallIntVectorValues(FieldVector fx, int rowCount, int[] values){ +SmallIntVector smallIntVector = ((SmallIntVector)fx); + +assertEquals(rowCount, smallIntVector.getValueCount()); + +for(int j = 0; j < smallIntVector.getValueCount(); j++){ +if(!smallIntVector.isNull(j)){ +assertEquals(values[j], smallIntVector.get(j)); +} +} + +return true; +} + +public static boolean assertBigIntVectorValues(FieldVector fx, int rowCount, int[] values){ +BigIntVector bigIntVector = ((BigIntVector)fx); + +assertEquals(rowCount, bigIntVector.getValueCount()); + +for(int j = 0; j < bigIntVector.getValueCount(); j++){ +if(!bigIntVector.isNull(j)) { +assertEquals(values[j], bigIntVector.get(j)); +} +} + +return true; +} + +public static boolean assertDecimalVectorValues(FieldVector fx, int rowCount, BigDecimal[] values){ +DecimalVector decimalVector = ((DecimalVector)fx); + +assertEquals(rowCount, decimalVector.getValueCount()); + +for(int j = 0; j < decimalVector.getValueCount(); j++){ +if(!decimalVector.isNull(j)){ +
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416390#comment-16416390 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177588748 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; +assert query != null && query.length() > 0: "SQL query can not be null or empty"; + +RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE); + +Statement stmt = null; Review comment: it should be possible to use `try(resource initializations) { }` and get rid of the finally clause This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416401#comment-16416401 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177593874 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object + * + */ +public class JdbcToArrowTestHelper { + +public static boolean assertIntVectorValues(FieldVector fx, int rowCount, int[] values) { +IntVector intVector = ((IntVector) fx); + +assertEquals(rowCount, intVector.getValueCount()); + +for(int j = 0; j < intVector.getValueCount(); j++) { +if(!intVector.isNull(j)) { Review comment: if `intVector.isNull()` returns true, shouldn't we fail the assertion? (or maybe take an Integer[] array instead?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416398#comment-16416398 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177591507 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416406#comment-16416406 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177594286 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java ## @@ -0,0 +1,250 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import java.math.BigDecimal; + +import org.apache.arrow.vector.BigIntVector; +import org.apache.arrow.vector.BitVector; +import org.apache.arrow.vector.DateMilliVector; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.arrow.vector.Float4Vector; +import org.apache.arrow.vector.Float8Vector; +import org.apache.arrow.vector.IntVector; +import org.apache.arrow.vector.SmallIntVector; +import org.apache.arrow.vector.TimeMilliVector; +import org.apache.arrow.vector.TimeStampVector; +import org.apache.arrow.vector.TinyIntVector; +import org.apache.arrow.vector.VarBinaryVector; +import org.apache.arrow.vector.VarCharVector; + +import static org.junit.Assert.*; + + +/** + * This is a Helper class which has functionalities to read and assert the values from teh given FieldVector object + * + */ +public class JdbcToArrowTestHelper { + +public static boolean assertIntVectorValues(FieldVector fx, int rowCount, int[] values) { +IntVector intVector = ((IntVector) fx); + +assertEquals(rowCount, intVector.getValueCount()); + +for(int j = 0; j < intVector.getValueCount(); j++) { +if(!intVector.isNull(j)) { +assertEquals(values[j], intVector.get(j)); +} +} +return true; +} + +public static boolean assertBitBooleanVectorValues(FieldVector fx, int rowCount, int[] values){ +BitVector bitVector = ((BitVector)fx); +assertEquals(rowCount, bitVector.getValueCount()); +for(int j = 0; j < bitVector.getValueCount(); j++){ +if(!bitVector.isNull(j)) { +assertEquals(values[j], bitVector.get(j)); +} +} +return true; +} + +public static boolean assertTinyIntVectorValues(FieldVector fx, int rowCount, int[] values){ +TinyIntVector tinyIntVector = ((TinyIntVector)fx); + +assertEquals(rowCount, tinyIntVector.getValueCount()); + +for(int j = 0; j < tinyIntVector.getValueCount(); j++){ +if(!tinyIntVector.isNull(j)) { +assertEquals(values[j], tinyIntVector.get(j)); +} +} +return true; +} + +public static boolean assertSmallIntVectorValues(FieldVector fx, int rowCount, int[] values){ +SmallIntVector smallIntVector = ((SmallIntVector)fx); + +assertEquals(rowCount, smallIntVector.getValueCount()); + +for(int j = 0; j < smallIntVector.getValueCount(); j++){ +if(!smallIntVector.isNull(j)){ +assertEquals(values[j], smallIntVector.get(j)); +} +} + +return true; +} + +public static boolean assertBigIntVectorValues(FieldVector fx, int rowCount, int[] values){ +BigIntVector bigIntVector = ((BigIntVector)fx); + +assertEquals(rowCount, bigIntVector.getValueCount()); + +for(int j = 0; j < bigIntVector.getValueCount(); j++){ +if(!bigIntVector.isNull(j)) { +assertEquals(values[j], bigIntVector.get(j)); +} +} + +return true; +} + +public static boolean assertDecimalVectorValues(FieldVector fx, int rowCount, BigDecimal[] values){ +DecimalVector decimalVector = ((DecimalVector)fx); + +assertEquals(rowCount, decimalVector.getValueCount()); + +for(int j = 0; j < decimalVector.getValueCount(); j++){ +if(!decimalVector.isNull(j)){ +
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416418#comment-16416418 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177596898 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/ArrowDataFetcherTest.java ## @@ -0,0 +1,139 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc.h2; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.apache.arrow.adapter.jdbc.*; +import org.apache.arrow.vector.VectorSchemaRoot; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; + +import static org.junit.Assert.*; + +/** + * Test class for {@link ArrowDataFetcher}. + */ +public class ArrowDataFetcherTest extends AbstractJdbcToArrowTest { + +private Connection conn = null; +private ObjectMapper mapper = null; + +@Before +public void setUp() throws Exception { +String url = "jdbc:h2:mem:ArrowDataFetcherTest"; +String driver = "org.h2.Driver"; + +mapper = new ObjectMapper(new YAMLFactory()); + +Class.forName(driver); + +conn = DriverManager.getConnection(url); +} + +@After +public void destroy() throws Exception { +if (conn != null) { +conn.close(); +conn = null; +} +} + +@Test +public void commaSeparatedQueryColumnsTest() { Review comment: since this is not using any connection, it might be moved in a separate test class with no setup required? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416392#comment-16416392 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177589405 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java ## @@ -0,0 +1,343 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.types.DateUnit; +import org.apache.arrow.vector.types.TimeUnit; +import org.apache.arrow.vector.types.pojo.ArrowType; +import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Schema; + +import java.nio.charset.Charset; +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE; +import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE; + + +/** + * Class that does most of the work to convert JDBC ResultSet data into Arrow columnar format Vector objects. + * + * @since 0.10.0 + */ +public class JdbcToArrowUtils { + +private static final int DEFAULT_BUFFER_SIZE = 256; + +/** + * Create Arrow {@link Schema} object for the given JDBC {@link ResultSetMetaData}. + * + * This method currently performs following type mapping for JDBC SQL data types to corresponding Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @param rsmd + * @return {@link Schema} + * @throws SQLException + */ +public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws SQLException { + +assert rsmd != null; + +//ImmutableList.Builder fields = ImmutableList.builder(); +List fields = new ArrayList<>(); +int columnCount = rsmd.getColumnCount(); +for (int i = 1; i <= columnCount; i++) { +String columnName = rsmd.getColumnName(i); +switch (rsmd.getColumnType(i)) { +case Types.BOOLEAN: +case Types.BIT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Bool()), null)); +break; +case Types.TINYINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(8, true)), null)); +break; +case Types.SMALLINT: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(16, true)), null)); +break; +case Types.INTEGER: +fields.add(new Field(columnName, FieldType.nullable(new ArrowType.Int(32, true)), null)); +break; +case
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416318#comment-16416318 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177587078 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; Review comment: typo... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416319#comment-16416319 ] ASF GitHub Bot commented on ARROW-1780: --- laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177587078 ## File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java ## @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +import org.apache.arrow.memory.RootAllocator; +import org.apache.arrow.vector.VectorSchemaRoot; + +import java.sql.*; + +/** + * Utility class to convert JDBC objects to columnar Arrow format objects. + * + * This utility uses following data mapping to map JDBC/SQL datatype to Arrow data types. + * + * CHAR--> ArrowType.Utf8 + * NCHAR --> ArrowType.Utf8 + * VARCHAR --> ArrowType.Utf8 + * NVARCHAR --> ArrowType.Utf8 + * LONGVARCHAR --> ArrowType.Utf8 + * LONGNVARCHAR --> ArrowType.Utf8 + * NUMERIC --> ArrowType.Decimal(precision, scale) + * DECIMAL --> ArrowType.Decimal(precision, scale) + * BIT --> ArrowType.Bool + * TINYINT --> ArrowType.Int(8, signed) + * SMALLINT --> ArrowType.Int(16, signed) + * INTEGER --> ArrowType.Int(32, signed) + * BIGINT --> ArrowType.Int(64, signed) + * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) + * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) + * BINARY --> ArrowType.Binary + * VARBINARY --> ArrowType.Binary + * LONGVARBINARY --> ArrowType.Binary + * DATE --> ArrowType.Date(DateUnit.MILLISECOND) + * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) + * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) + * CLOB --> ArrowType.Utf8 + * BLOB --> ArrowType.Binary + * + * @since 0.10.0 + * @see ArrowDataFetcher + */ +public class JdbcToArrow { + +/** + * For the given SQL query, execute and fetch the data from Relational DB and convert it to Arrow objects. + * + * @param connection Database connection to be used. This method will not close the passed connection object. Since hte caller has passed + * the connection object it's the responsibility of the caller to close or return the connection to the pool. + * @param query The DB Query to fetch the data. + * @return + * @throws SQLException Propagate any SQL Exceptions to the caller after closing any resources opened such as ResultSet and Statment objects. + */ +public static VectorSchemaRoot sqlToArrow(Connection connection, String query) throws Exception { + +assert connection != null: "JDBC conncetion object can not be null"; Review comment: typo in the assert message This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - >
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416020#comment-16416020 ] ASF GitHub Bot commented on ARROW-1780: --- siddharthteotia commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-376620084 Going through the changes. Will finish reviewing soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416019#comment-16416019 ] ASF GitHub Bot commented on ARROW-1780: --- siddharthteotia commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r177517745 ## File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/Table.java ## @@ -0,0 +1,74 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.adapter.jdbc; + +/** + * Review comment: I think class description is missing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416005#comment-16416005 ] ASF GitHub Bot commented on ARROW-1780: --- wesm commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-376617996 @atuldambalkar since this is a large PR, and we haven't had deep feedback from anyone focused on the Java implementation yet, it may be a little while for some review to come through. I would suggest bumping the mailing list thread about JDBC to draw more attention to this PR This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416006#comment-16416006 ] ASF GitHub Bot commented on ARROW-1780: --- siddharthteotia commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-376618019 It would be good to add (or point to) simple example pieces of code to show the usage of JDBC adapter code. This would help users on how to go about writing an application that converts JDBC result sets to Arrow column vectors. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415991#comment-16415991 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-376613506 Any update on this PR? I would be interested to know if there any further review comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412260#comment-16412260 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#issuecomment-375825688 Uwe, I have updated the code based on your comments. Also merged with latest 0.10.0-SNAPSHOT. Let's wait if someone from Java side can review this. I will send a message on Slack. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409960#comment-16409960 ] ASF GitHub Bot commented on ARROW-1780: --- xhochy commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r175906985 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,76 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd;> + +4.0.0 +org.apache.arrow.adapter.jdbc +arrow-jdbc +jar +0.10.0-SNAPSHOT +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +0.9.0-SNAPSHOT + + + +org.apache.arrow +arrow-vector +0.9.0-SNAPSHOT Review comment: See above This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409958#comment-16409958 ] ASF GitHub Bot commented on ARROW-1780: --- xhochy commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r175906418 ## File path: dev/release/rat_exclude_files.txt ## @@ -74,3 +74,5 @@ c_glib/doc/reference/gtk-doc.make *.svg *.devhelp2 *.scss +*.yml Review comment: Don't exclude these files but add a license header to them This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409959#comment-16409959 ] ASF GitHub Bot commented on ARROW-1780: --- xhochy commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759#discussion_r175906922 ## File path: java/adapter/jdbc/pom.xml ## @@ -0,0 +1,76 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd;> + +4.0.0 +org.apache.arrow.adapter.jdbc +arrow-jdbc +jar +0.10.0-SNAPSHOT +Arrow JDBC Adapter +http://maven.apache.org + + + + +org.apache.arrow +arrow-memory +0.9.0-SNAPSHOT Review comment: Use `${project.version}` here instead of `0.9.0-SNAPSHOT` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401077#comment-16401077 ] ASF GitHub Bot commented on ARROW-1780: --- atuldambalkar opened a new pull request #1759: ARROW-1780 - (WIP) JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects URL: https://github.com/apache/arrow/pull/1759 This code enhancement is for converting JDBC ResultSet Relational objects to Arrow columnar data Vector objects. Code is under director "java/adapter/jdbc/src/main". The API has following static methods in the class org.apache.arrow.adapter.jdbc.JdbcToArrow - public static VectorSchemaRoot sqlToArrow(Connection connection, String query) public static ArrowDataFetcher jdbcArrowDataFetcher(Connection connection, String tableName) Utility uses following data mapping to convert JDBC/SQL data types to Arrow data types - CHAR --> ArrowType.Utf8 NCHAR--> ArrowType.Utf8 VARCHAR --> ArrowType.Utf8 NVARCHAR --> ArrowType.Utf8 LONGVARCHAR --> ArrowType.Utf8 LONGNVARCHAR --> ArrowType.Utf8 NUMERIC --> ArrowType.Decimal(precision, scale) DECIMAL --> ArrowType.Decimal(precision, scale) BIT --> ArrowType.Bool TINYINT --> ArrowType.Int(8, signed) SMALLINT --> ArrowType.Int(16, signed) INTEGER --> ArrowType.Int(32, signed) BIGINT --> ArrowType.Int(64, signed) REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) BINARY --> ArrowType.Binary VARBINARY --> ArrowType.Binary LONGVARBINARY --> ArrowType.Binary DATE --> ArrowType.Date(DateUnit.MILLISECOND) TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32) TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null) CLOB --> ArrowType.Utf8 BLOB --> ArrowType.Binary JUnit test cases under java/adapter/jdbc/src/test. Test cases uses H2 in-memory database. I am still working on adding and automating additional test cases. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Assignee: Atul Dambalkar >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369612#comment-16369612 ] Atul Dambalkar commented on ARROW-1780: --- Based on above comments, I will update the API with necessary parameters. It will be more or less like pagination. > JDBC Adapter for Apache Arrow > - > > Key: ARROW-1780 > URL: https://issues.apache.org/jira/browse/ARROW-1780 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Atul Dambalkar >Priority: Major > > At a high level the JDBC Adapter will allow upstream apps to query RDBMS data > over JDBC and get the JDBC objects converted to Arrow objects/structures. The > upstream utility can then work with Arrow objects/structures with usual > performance benefits. The utility will be very much similar to C++ > implementation of "Convert a vector of row-wise data into an Arrow table" as > described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > The utility will read data from RDBMS and covert the data into Arrow > objects/structures. So from that perspective this will Read data from RDBMS, > If the utility can push Arrow objects to RDBMS is something need to be > discussed and will be out of scope for this utility for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)