[Python] JDBC with Arrow code suggestion

HSU YU-MING Thu, 09 Sep 2021 05:18:51 -0700

Hi All:

I am trying to fetch Oracle's data, and transfer it to arrow. Below is my
code snippet, Regarding to "[ jvm.record_batch(arrow_vector) for
arrow_vector in arrow_vector_iterator]" this line, are there any way i
can parallel in order to speed up / gain more performance?




def select_pyarrow_jvm(query):
    start = time.time()

    stmt = jdbc_sql_connection.createStatement()
    result_set = stmt.executeQuery(query)


    try:
        arrow_vector_iterator =
jpype.JPackage("org").apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrowVectorIterator(
            result_set,
            ra
        )
        record_batch_list = [ jvm.record_batch(arrow_vector) for
arrow_vector in arrow_vector_iterator]
        data_arrow_tbl = pa.Table.from_batches(record_batch_list)
        df = data_arrow_tbl.to_pandas()
    except Exception as e:
        logging.exception("Error inside select_pyarrow_jvm ")

    finally:
        # Ensure that we clear the JVM memory.
        stmt.close()
    elapse = time.time() - start
    return df,elapse


Many Thanks,

Abe

[Python] JDBC with Arrow code suggestion

Reply via email to