[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923778#comment-16923778
 ] 

Wes McKinney commented on ARROW-6417:
-

After reverting the jemalloc version, the benchmarks show that master is faster 
than v0.12.1, which is certainly what I was *hoping for* after all the work we 
put in refactoring stuff the last few months. So the SafeLoadAs issue is no 
longer a concern

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923639#comment-16923639
 ] 

Antoine Pitrou commented on ARROW-6417:
---

FTR, similar issues with jemalloc seem to have happened in the past, I wonder 
if it's a regression:
https://github.com/jemalloc/jemalloc/issues/335
https://github.com/jemalloc/jemalloc/issues/126


> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923630#comment-16923630
 ] 

Antoine Pitrou commented on ARROW-6417:
---

Wow that's massive.

Re {{SafeLoadAs}}, I'm with Micah: it shouldn't make a difference on a x86 CPU 
with a decent compiler. Worth checking anyway.

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923612#comment-16923612
 ] 

Wes McKinney commented on ARROW-6417:
-

I opened an issue with jemalloc to see if we're doing something wrong 
https://github.com/jemalloc/jemalloc/issues/1621

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923607#comment-16923607
 ] 

Wes McKinney commented on ARROW-6417:
-

The benchmark results in arrow-builder-benchmark are pretty damning...

master

{code}
---
BenchmarkTime   CPU Iterations
---
BufferBuilderTinyWrites/real_time264925407 ns  264925189 ns  2
966.31MB/s
BufferBuilderSmallWrites/real_time   178721490 ns  178720664 ns  4   
1.39882GB/s
BufferBuilderLargeWrites/real_time   192722520 ns  192720335 ns  4   
1.29027GB/s
BuildBooleanArrayNoNulls  61622618 ns   61620052 ns 11   
4.05712GB/s
BuildIntArrayNoNulls 159926782 ns  159919611 ns  4   
1.56329GB/s
BuildAdaptiveIntNoNulls   34141484 ns   34141072 ns 20   
7.32256GB/s
BuildAdaptiveIntNoNullsScalarAppend  118671966 ns  118669726 ns  6   
2.10669GB/s
BuildBinaryArray 646172067 ns  646165509 ns  1   
396.183MB/s
BuildChunkedBinaryArray  629538527 ns  629517882 ns  1
406.66MB/s
BuildFixedSizeBinaryArray319843478 ns  319421997 ns  2   
801.448MB/s
BuildDecimalArray613258571 ns  613249404 ns  1   
834.897MB/s
BuildInt64DictionaryArrayRandom  265489567 ns  265479003 ns  3   
964.295MB/s
BuildInt64DictionaryArraySequential  256461735 ns  256454103 ns  3   
998.229MB/s
BuildInt64DictionaryArraySimilar 436497455 ns  436496161 ns  2   
586.489MB/s
BuildStringDictionaryArray   737468427 ns  737429710 ns  1   
463.142MB/s
ArrayDataConstructDestruct   38895 ns  38895 ns  18067
{code}

master with the older jemalloc

{code}
---
BenchmarkTime   CPU Iterations
---
BufferBuilderTinyWrites/real_time139816022 ns  139814056 ns  5   
1.78806GB/s
BufferBuilderSmallWrites/real_time35215592 ns   35214766 ns 19   
7.09912GB/s
BufferBuilderLargeWrites/real_time32460612 ns   32456001 ns 21   
7.66046GB/s
BuildBooleanArrayNoNulls  33690068 ns   33688611 ns 21   
7.42091GB/s
BuildIntArrayNoNulls  49988970 ns   49987507 ns 14   
5.00125GB/s
BuildAdaptiveIntNoNulls   23878665 ns   23876703 ns 29   
10.4705GB/s
BuildAdaptiveIntNoNullsScalarAppend  116140426 ns  116137665 ns  6   
2.15262GB/s
BuildBinaryArray 593711307 ns  593699295 ns  1   
431.195MB/s
BuildChunkedBinaryArray  538185849 ns  538185876 ns  1   
475.672MB/s
BuildFixedSizeBinaryArray218638403 ns  218631191 ns  3   
1.14348GB/s
BuildDecimalArray294477232 ns  294474155 ns  2   
1.69794GB/s
BuildInt64DictionaryArrayRandom  248790745 ns  248788395 ns  3   
1028.99MB/s
BuildInt64DictionaryArraySequential  238954386 ns  238949356 ns  3   
1071.36MB/s
BuildInt64DictionaryArraySimilar 422484600 ns  422471016 ns  2   
605.959MB/s
BuildStringDictionaryArray   716507144 ns  716487471 ns  1
476.68MB/s
ArrayDataConstructDestruct   38406 ns  38406 ns  18229
{code}

So it seems that performance in realloc-heavy workloads is degraded

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923566#comment-16923566
 ] 

Wes McKinney commented on ARROW-6417:
-

OK, it appears that the jemalloc version is causing the perf difference

current master branch with vendored jemalloc version (4.something with patches)

{code}
$ python 20190903_parquet_benchmark.py 
dense-random 10
({'case': 'read-dense-random-single-thread'}, 0.6065331888198853)
{code}

master with jemalloc 5.2.0

{code}
$ python 20190903_parquet_benchmark.py 
dense-random 10
({'case': 'read-dense-random-single-thread'}, 1.2143790817260742)
{code}

To reproduce these results yourself

* Get the old jemalloc tarball from here 
https://github.com/apache/arrow/tree/maint-0.12.x/cpp/thirdparty/jemalloc
* Set {{$ARROW_JEMALLOC_URL}} to the path of that before building
* Use this branch which has the old EP configuration 
https://github.com/wesm/arrow/tree/use-old-jemalloc

Here's the benchmark script that I'm running above

https://gist.github.com/wesm/7e5ae1d41981cfdd20415faf71e5f57e

I'm interested if other benchmarks are affected or if this is a peculiarity of 
this particular benchmark

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923534#comment-16923534
 ] 

Micah Kornfield commented on ARROW-6417:


For SafeLoadAs, you could try changing the implementation to dereference 
instead of memcpy, which should be equivalent to the old code (assuming it is 
getting inlined correctly).  IIRC, we saw very comparable numbers for the 
existing parquet benchmarks when I made those changes. 

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923474#comment-16923474
 ] 

Wes McKinney commented on ARROW-6417:
-

I will try that next. I'm going to merge my current patch in the meantime and 
leave this JIRA open

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-05 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923360#comment-16923360
 ] 

Antoine Pitrou commented on ARROW-6417:
---

Have you tried to measure the same jemalloc version for the two Arrow versions 
(or, conversely, the two jemalloc versions for the same Arrow version)?

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-04 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922791#comment-16922791
 ] 

Wes McKinney commented on ARROW-6417:
-

Further down the rabbit hole

0.12.1 perf profile

{code}
   - parquet::arrow::FileReader::Impl::ReadSchemaField  

   
  - 66.24% parquet::arrow::ColumnReader::NextBatch  

   
 - parquet::arrow::PrimitiveImpl::NextBatch 

   
- 66.23% parquet::internal::RecordReader::ReadRecords   

   
   - 41.51% 
parquet::internal::TypedRecordReader 
>::ReadRecordData   
  - 38.62% 
parquet::internal::TypedRecordReader 
>::ReadValuesSpaced  
 - 26.97% arrow::internal::ChunkedBinaryBuilder::Append 

   
- 24.06% arrow::BinaryBuilder::Append   

   
   + 12.78% arrow::BufferBuilder::Append

   
 1.99% arrow::ArrayBuilder::Reserve 

   
 1.16% arrow::BufferBuilder::Append@plt 

   
 0.52% arrow::ArrayBuilder::Reserve@plt 

 0.57% arrow::BinaryBuilder::Append@plt 

  
 + 8.34% 
parquet::Decoder >::DecodeSpaced  
  
   0.53% arrow::internal::ChunkedBinaryBuilder::Append@plt  

   
2.02% parquet::internal::DefinitionLevelsToBitmap   

   
  + 0.86% 
parquet::internal::RecordReader::RecordReaderImpl::ReserveValues

   + 24.31% 
parquet::internal::TypedRecordReader 
>::ReadNewPage 
{code}

master / my ARROW-6417 branch

{code}
   - 74.04% 
parquet::internal::TypedRecordReader
 >::ReadRecords  
  - 49.00% 
parquet::internal::TypedRecordReader
 >::ReadRecordData
 - 45.82% 
parquet::internal::ByteArrayChunkedRecordReader::ReadValuesSpaced   
 
- 45.19% parquet::PlainByteArrayDecoder::DecodeArrow

   
   + 20.92% 
arrow::BaseBinaryBuilder::ReserveData
   
 7.61% __memmove_avx_unaligned_erms 

   
   + 2.59% arrow::BaseBinaryBuilder::Resize  

   
 0.77% memcpy@plt   

   
+ 0.63% parquet::DictByteArrayDecoderImpl::DecodeArrow  

   
   2.09% parquet::internal::DefinitionLevelsToBitmap

   
 + 1.07% 
parquet::internal::TypedRecordReader
 >::ReserveValues   
  + 24.32% parquet::SerializedPageReader::NextPage  

   
{code}

Furthermore, jemalloc is show up as taking a lot more time on 

[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-04 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922758#comment-16922758
 ] 

Wes McKinney commented on ARROW-6417:
-

So on closer inspection, in v0.11.1 we weren't yet handling chunked binary 
reads at all, so the comparison is not really apples to oranges. v0.12.x was 
the first release series to include chunking support, so could be the more 
appropriate comparison. 

This performance issue is really vexing. We also changed jemalloc versions 
between 0.12.x and 0.15.x so I wonder if the allocator version could be 
impacting performance

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921168#comment-16921168
 ] 

Wes McKinney commented on ARROW-6417:
-

OK, I think to make things faster we need to be more careful about 
pre-allocating with {{BinaryBuilder}} and calling 
{{BaseBinaryBuilder::UnsafeAppend}} instead of {{Append}}. It's a bit tricky 
because we have {{ChunkedBinaryBuilder}} in the mix, so we may have to manage 
the creation of chunks in the Parquet value decoder. I think this is worth the 
effort given how much of a hot path this is for reading Parquet files. I'll 
spend a little time on it tomorrow

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921164#comment-16921164
 ] 

Wes McKinney commented on ARROW-6417:
-

The dreaded {{\_\_memmove_avx_unaligned_erms}} has showed up again. I'll have a 
poke at this to see what could be done

 
{code:java}
+   60.85% 0.01%  python   libparquet.so.15.0.0 
  [.] 
parquet::internal::TypedRecordReader::Append   
   ▒
+   30.62% 9.64%  python   libarrow.so.15.0.0   
  [.] arrow::BufferBuilder::Append  
   ▒
+   23.51%10.91%  python   libc-2.27.so 
  [.] __memmove_avx_unaligned_erms  
   ▒
+   21.23% 0.00%  python   [unknown]
  [.] 0x
   ▒
+   18.58% 0.01%  python   libparquet.so.15.0.0 
  [.] 
parquet::ColumnReaderImplBase >▒
+   18.45%14.80%  python   libsnappy.so.1.1.7   
  [.] snappy::RawUncompress 
   ▒
+   18.42% 0.02%  python   libparquet.so.15.0.0 
  [.] parquet::SerializedPageReader::NextPage   
   ▒
+   18.27% 0.00%  python   libarrow.so.15.0.0   
  [.] arrow::util::SnappyCodec::Decompress  
   ▒
+   18.27% 0.00%  python   libarrow.so.15.0.0   
  [.] arrow::util::SnappyCodec::Decompress  
   ▒
+   18.27% 0.00%  python   libsnappy.so.1.1.7   
  [.] snappy::RawUncompress 
   ▒
+   14.99% 0.00%  python   libarrow.so.15.0.0   
  [.] arrow::PoolBuffer::Resize 
   ▒
+   14.99% 0.00%  python   libarrow.so.15.0.0   
  [.] arrow::PoolBuffer::Reserve
   ▒
+   14.99% 0.00%  python   libarrow.so.15.0.0   
  [.] arrow::DefaultMemoryPool::Reallocate  
   ▒
+   14.98% 0.01%  python   libarrow.so.15.0.0   
  [.] je_arrow_rallocx  
   ▒
+   14.97% 0.00%  python   libarrow.so.15.0.0   
  [.] je_arrow_private_je_arena_ralloc  
   ▒
+   14.96% 0.00%  python   libarrow.so.15.0.0   
  [.] je_arrow_private_je_large_ralloc  
   ▒
+   14.64% 0.00%  python   libarrow.so.15.0.0   
  [.] arrow::BufferBuilder::Resize  
   ▒
+   12.82%12.82%  python   [unknown]
  [k] 0x98e00a67
   ▒
+   11.74% 3.73%  python   libarrow.so.15.0.0   
  [.] arrow::BaseBinaryBuilder::AppendNextOffset  {code}

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921131#comment-16921131
 ] 

Wes McKinney commented on ARROW-6417:
-

I updated the results plot to use gcc 8.3 in both v0.11.1 and master branch as 
of 9/2/2019

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -
>
> Key: ARROW-6417
> URL: https://issues.apache.org/jira/browse/ARROW-6417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
> Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)