[
https://issues.apache.org/jira/browse/ARROW-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liya Fan updated ARROW-6172:
Description:
We provide benchmarks to evaluate the performance of setting IntVector in 3
different ways:
# through a value holder
# through a writer
# directly set the value through a set method
was:
When converting JDBC data to Arrow data. A value holder is created for each
single value. The following code snippet gives an example:
NullableSmallIntHolder holder = new NullableSmallIntHolder();
holder.isSet = isNonNull ? 1 : 0;
if (isNonNull) {
holder.value = (short) value;
}
smallIntVector.setSafe(rowCount, holder);
smallIntVector.setValueCount(rowCount + 1);
This is inefficient, both in terms of memory usage, and computational
efficiency.
For most types, we can improve the performance by directly setting the value.
For example, the benchmarks on IntVector show that a 20% performance
improvement can be achieved by directly setting the int value:
Benchmark Mode Cnt Score Error Units
IntBenchmarks.setIntDirectly avgt 5 15.397 ± 0.018 us/op
IntBenchmarks.setWithValueHolder avgt 5 19.198 ± 0.789 us/op
> [Java] Provide benchmarks to set IntVector with different methods
> -
>
> Key: ARROW-6172
> URL: https://issues.apache.org/jira/browse/ARROW-6172
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
> Labels: pull-request-available
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
>
> We provide benchmarks to evaluate the performance of setting IntVector in 3
> different ways:
> # through a value holder
> # through a writer
> # directly set the value through a set method
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)