Re: [Python] JDBC with Arrow code suggestion

Fan Liya Thu, 09 Sep 2021 20:13:19 -0700

Hi YU-MING,

It seems ResultSet is not thread-safe in general, so we do not provide a
parallel JDBC adapter.
However, the Oracle implementation seems to be thread-safe, so you can
implement your own parallel adapter.


Best,
Liya Fan


HSU YU-MING <[email protected]> 于2021年9月9日周四 下午8:18写道：

> Hi All:
>
> I am trying to fetch Oracle's data, and transfer it to arrow. Below is my
> code snippet, Regarding to "[ jvm.record_batch(arrow_vector) for
> arrow_vector in arrow_vector_iterator]" this line, are there any way i
> can parallel in order to speed up / gain more performance?
>
>
>
> def select_pyarrow_jvm(query):
>     start = time.time()
>
>     stmt = jdbc_sql_connection.createStatement()
>     result_set = stmt.executeQuery(query)
>
>
>     try:
>         arrow_vector_iterator = 
> jpype.JPackage("org").apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrowVectorIterator(
>             result_set,
>             ra
>         )
>         record_batch_list = [ jvm.record_batch(arrow_vector) for arrow_vector 
> in arrow_vector_iterator]
>         data_arrow_tbl = pa.Table.from_batches(record_batch_list)
>         df = data_arrow_tbl.to_pandas()
>     except Exception as e:
>         logging.exception("Error inside select_pyarrow_jvm ")
>
>     finally:
>         # Ensure that we clear the JVM memory.
>         stmt.close()
>     elapse = time.time() - start
>     return df,elapse
>
>
> Many Thanks,
>
> Abe
>
>

Re: [Python] JDBC with Arrow code suggestion

Reply via email to