Hi All:
I am trying to fetch Oracle's data, and transfer it to arrow. Below is my
code snippet, Regarding to "[ jvm.record_batch(arrow_vector) for
arrow_vector in arrow_vector_iterator]" this line, are there any way i
can parallel in order to speed up / gain more performance?
def select_pyarrow_jvm(query):
start = time.time()
stmt = jdbc_sql_connection.createStatement()
result_set = stmt.executeQuery(query)
try:
arrow_vector_iterator =
jpype.JPackage("org").apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrowVectorIterator(
result_set,
ra
)
record_batch_list = [ jvm.record_batch(arrow_vector) for
arrow_vector in arrow_vector_iterator]
data_arrow_tbl = pa.Table.from_batches(record_batch_list)
df = data_arrow_tbl.to_pandas()
except Exception as e:
logging.exception("Error inside select_pyarrow_jvm ")
finally:
# Ensure that we clear the JVM memory.
stmt.close()
elapse = time.time() - start
return df,elapse
Many Thanks,
Abe