[jira] [Updated] (ARROW-6172) [Java] Provide benchmarks to set IntVector with different methods

2019-08-14 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-6172:

Description: 
 

We provide benchmarks to evaluate the performance of setting IntVector in 3 
different ways:
 # through a value holder
 # through a writer
 # directly set the value through a set method

  was:
When converting JDBC data to Arrow data. A value holder is created for each 
single value. The following code snippet gives an example:

NullableSmallIntHolder holder = new NullableSmallIntHolder();
 holder.isSet = isNonNull ? 1 : 0;
 if (isNonNull) {
 holder.value = (short) value;
 }
 smallIntVector.setSafe(rowCount, holder);
 smallIntVector.setValueCount(rowCount + 1);

 

This is inefficient, both in terms of memory usage, and computational 
efficiency. 

For most types, we can improve the performance by directly setting the value.

For example, the benchmarks on IntVector show that a 20% performance 
improvement can be achieved by directly setting the int value:

 

Benchmark Mode Cnt Score Error Units
IntBenchmarks.setIntDirectly avgt 5 15.397 ± 0.018 us/op
IntBenchmarks.setWithValueHolder avgt 5 19.198 ± 0.789 us/op

 


> [Java] Provide benchmarks to set IntVector with different methods
> -
>
> Key: ARROW-6172
> URL: https://issues.apache.org/jira/browse/ARROW-6172
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>  
> We provide benchmarks to evaluate the performance of setting IntVector in 3 
> different ways:
>  # through a value holder
>  # through a writer
>  # directly set the value through a set method



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6172) [Java] Provide benchmarks to set IntVector with different methods

2019-08-14 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-6172:

Summary: [Java] Provide benchmarks to set IntVector with different methods  
(was: [Java] Avoid creating value holders repeatedly when reading data from 
JDBC)

> [Java] Provide benchmarks to set IntVector with different methods
> -
>
> Key: ARROW-6172
> URL: https://issues.apache.org/jira/browse/ARROW-6172
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When converting JDBC data to Arrow data. A value holder is created for each 
> single value. The following code snippet gives an example:
> NullableSmallIntHolder holder = new NullableSmallIntHolder();
>  holder.isSet = isNonNull ? 1 : 0;
>  if (isNonNull) {
>  holder.value = (short) value;
>  }
>  smallIntVector.setSafe(rowCount, holder);
>  smallIntVector.setValueCount(rowCount + 1);
>  
> This is inefficient, both in terms of memory usage, and computational 
> efficiency. 
> For most types, we can improve the performance by directly setting the value.
> For example, the benchmarks on IntVector show that a 20% performance 
> improvement can be achieved by directly setting the int value:
>  
> Benchmark Mode Cnt Score Error Units
> IntBenchmarks.setIntDirectly avgt 5 15.397 ± 0.018 us/op
> IntBenchmarks.setWithValueHolder avgt 5 19.198 ± 0.789 us/op
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)