[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-25 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-649585804


   +1, thanks for fixing this @emkornfield!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-25 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-649585468


   FWIW gcc benchmarks (sse4.2) on my machine (ubuntu 18.04 on i9-9960X)
   
   ```
   $ archery benchmark diff --cc=gcc-8 --cxx=g++-8 emkornfield/ARROW-8504 
master --suite-filter=parquet-arrow
   benchmark baselinecontender  
change %counters
   35 BM_ReadColumn/75/1  267.930 MiB/sec  275.083 MiB/sec  
   2.670   {'iterations': 2}
   6  BM_ReadColumn/99/0  238.381 MiB/sec  244.286 MiB/sec  
   2.477   {'iterations': 3}
   10   BM_ReadColumn/10/50  270.775 MiB/sec  275.249 MiB/sec  
   1.652   {'iterations': 2}
   31 BM_ReadColumn/25/5  171.549 MiB/sec  174.237 MiB/sec  
   1.567   {'iterations': 2}
   29BM_ReadColumn/-1/1  856.194 MiB/sec  866.945 MiB/sec  
   1.256  {'iterations': 11}
   7BM_WriteColumn   31.593 MiB/sec   31.845 MiB/sec  
   0.797   {'iterations': 1}
   44 BM_ReadIndividualRowGroups  849.469 MiB/sec  856.209 MiB/sec  
   0.793   {'iterations': 6}
   36BM_ReadColumn/-1/0  408.388 MiB/sec  411.490 MiB/sec  
   0.759   {'iterations': 3}
   3  BM_ReadColumn/50/0  151.600 MiB/sec  152.399 MiB/sec  
   0.527   {'iterations': 2}
   32BM_ReadColumn/30/10  268.512 MiB/sec  269.849 MiB/sec  
   0.498   {'iterations': 2}
   8   BM_ReadColumn/-1/20  772.878 MiB/sec  776.716 MiB/sec  
   0.497   {'iterations': 5}
   17   BM_ReadColumn/25/25  277.705 MiB/sec  278.957 MiB/sec  
   0.451   {'iterations': 2}
   23BM_ReadColumn/-1/11.472 GiB/sec1.478 GiB/sec  
   0.414  {'iterations': 13}
   30  BM_ReadColumn/1/1  235.809 MiB/sec  236.730 MiB/sec  
   0.391   {'iterations': 3}
   45 BM_WriteColumn   45.067 MiB/sec   45.232 MiB/sec  
   0.366   {'iterations': 1}
   41BM_ReadColumn/45/25  241.545 MiB/sec  242.427 MiB/sec  
   0.365   {'iterations': 2}
   40   BM_ReadColumn/-1/10  710.485 MiB/sec  712.181 MiB/sec  
   0.239   {'iterations': 5}
   22  BM_ReadColumn/-1/0  211.939 MiB/sec  212.433 MiB/sec  
   0.233  {'iterations': 15}
   39   BM_ReadColumn/-1/50  629.632 MiB/sec  630.955 MiB/sec  
   0.210   {'iterations': 9}
   15  BM_ReadColumn/1/20  145.490 MiB/sec  145.701 MiB/sec  
   0.145  {'iterations': 10}
   9BM_ReadColumn/-1/1  162.373 MiB/sec  162.497 MiB/sec  
   0.077   {'iterations': 4}
   11 BM_ReadColumn/-1/0  257.190 MiB/sec  257.278 MiB/sec  
   0.034   {'iterations': 3}
   13BM_WriteColumn   32.378 MiB/sec   32.386 MiB/sec  
   0.025   {'iterations': 1}
   37BM_ReadColumn/99/50  238.918 MiB/sec  238.975 MiB/sec  
   0.024   {'iterations': 3}
   20   BM_WriteColumn   54.200 MiB/sec   54.209 MiB/sec  
   0.017   {'iterations': 1}
   19  BM_ReadColumn/1/1  369.522 MiB/sec  369.199 MiB/sec  
  -0.087   {'iterations': 2}
   0BM_ReadColumn/-1/501.144 GiB/sec1.143 GiB/sec  
  -0.090  {'iterations': 10}
   34BM_ReadColumn/35/10  262.271 MiB/sec  261.890 MiB/sec  
  -0.145   {'iterations': 2}
   21 BM_ReadColumn/50/1  242.510 MiB/sec  242.067 MiB/sec  
  -0.183   {'iterations': 2}
   5 BM_ReadColumn/50/50  152.344 MiB/sec  152.000 MiB/sec  
  -0.225   {'iterations': 2}
   1 BM_ReadColumn/10/10  179.063 MiB/sec  178.583 MiB/sec  
  -0.268   {'iterations': 2}
   28BM_WriteColumn   64.162 MiB/sec   63.990 MiB/sec  
  -0.269   {'iterations': 1}
   38BM_WriteColumn   69.641 MiB/sec   69.446 MiB/sec  
  -0.280   {'iterations': 1}
   18  BM_WriteColumn   26.643 MiB/sec   26.557 MiB/sec  
  -0.324   {'iterations': 2}
   16   BM_ReadColumn/-1/02.291 GiB/sec2.281 GiB/sec  
  -0.397  {'iterations': 20}
   43 BM_WriteColumn   75.453 MiB/sec   75.050 MiB/sec  
  -0.534   {'iterations': 1}
   33   BM_ReadColumn/5/10  115.534 MiB/sec  114.574 MiB/sec  
  -0.832   {'iterations': 3}
   14BM_ReadColumn/50/50  242.071 MiB/sec  240.005 MiB/sec  
  -0.854   {'iterations': 2}
   4BM_ReadMultipleRowGroups  826.026 MiB/sec  818.709 MiB/sec  
  -0.886   {'iterations': 5}
   24   BM_ReadColumn/-1/10  386.383 MiB/sec  382.213 MiB/sec  
  -1.079   {'iterations': 6}
   12 BM_ReadColumn/-1/0  410.877 MiB/sec  404.700 MiB/sec  
  -1.503   {'iterations': 3}
   25 BM_ReadColumn/10/5  289.612 MiB/sec  284.777 MiB/sec  
  -1.670   {'iterations': 2}
   2 BM_ReadColumn/25/10  277.045 MiB/sec  271.925 MiB/sec  
  -1.848   {'iterations': 2}
   26  BM_ReadColumn/5/5  311.137 MiB/sec  295.823 MiB/sec  
  -4.922   {'iterations': 2}
   42 BM_ReadColumn/99/0  371.803 MiB/sec  347.872 MiB/sec  
  -6.436   {'iterations': 2}
   27BM_ReadColumn/99/50  381.029 MiB/sec  351.329 MiB/sec  
  -7.795   {'iterations': 3}
   ```
   
   Here's clang-11
   
   ```
   $ archery benchmark diff --cc=clang-11 --cxx=clang++-11 
emkornfield/ARROW-8504 master --suite-filter=parquet-arrow
   benchmark baselinecontender  
change %  

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-648454941


   The bug seems to be in the BitRunReader
   
   
![image](https://user-images.githubusercontent.com/329591/85470398-95a9fa00-b574-11ea-99c4-3f06db4a0179.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-648453423


   @emkornfield I'm sort of at a dead end here, hopefully the above gives you 
some clues about where there might be a problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-648451899


   there seems to be a situation where the bit run has more values then are 
needed to fulfill the call to `GetSpaced`
   
   
![image](https://user-images.githubusercontent.com/329591/85469465-50d19380-b573-11ea-8129-14da0a68dc5a.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-648446373


   I'm able to reproduce the error in VS and set breakpoints, I got this far to 
see that GetBatchWithDictSpaced has decoded more values than it was asked to
   
   
![image](https://user-images.githubusercontent.com/329591/85467944-6e9df900-b571-11ea-82f8-5f7efade8f2e.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-648230676


   I'm not sure what the MSVC failure is about but I'll debug locally



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-647699240


   Here's with clang, even less of an issue there it seems
   
   ```
   benchmark baselinecontender  
change %  regression
   32  BM_ReadColumn/1/11.656 GiB/sec2.689 GiB/sec  
  62.384   False
   15 BM_ReadColumn/99/02.861 GiB/sec4.184 GiB/sec  
  46.245   False
   21BM_ReadColumn/99/501.890 GiB/sec2.744 GiB/sec  
  45.220   False
   20BM_ReadColumn/99/502.898 GiB/sec4.199 GiB/sec  
  44.891   False
   6  BM_ReadColumn/50/11.530 GiB/sec2.210 GiB/sec  
  44.402   False
   30BM_ReadColumn/50/501.586 GiB/sec2.233 GiB/sec  
  40.787   False
   28 BM_ReadColumn/50/0  881.102 MiB/sec1.196 GiB/sec  
  39.025   False
   22BM_ReadColumn/45/251.598 GiB/sec2.221 GiB/sec  
  38.951   False
   14  BM_ReadColumn/1/12.967 GiB/sec4.122 GiB/sec  
  38.919   False
   29BM_ReadColumn/50/50  887.206 MiB/sec1.203 GiB/sec  
  38.854   False
   35 BM_ReadColumn/99/01.928 GiB/sec2.661 GiB/sec  
  38.045   False
   17BM_ReadColumn/35/101.656 GiB/sec2.248 GiB/sec  
  35.689   False
   0BM_ReadColumn/-1/502.765 GiB/sec3.750 GiB/sec  
  35.628   False
   26BM_ReadColumn/30/101.692 GiB/sec2.221 GiB/sec  
  31.244   False
   19 BM_ReadColumn/75/11.712 GiB/sec2.217 GiB/sec  
  29.505   False
   2  BM_ReadColumn/25/5  991.022 MiB/sec1.229 GiB/sec  
  27.031   False
   31BM_ReadColumn/25/101.744 GiB/sec2.164 GiB/sec  
  24.091   False
   16   BM_ReadColumn/-1/505.892 GiB/sec7.252 GiB/sec  
  23.084   False
   24BM_ReadColumn/10/101.003 GiB/sec1.225 GiB/sec  
  22.073   False
   23   BM_ReadColumn/25/251.790 GiB/sec2.159 GiB/sec  
  20.659   False
   10  BM_ReadColumn/5/51.995 GiB/sec2.371 GiB/sec  
  18.853   False
   5  BM_ReadColumn/10/51.796 GiB/sec2.122 GiB/sec  
  18.128   False
   3   BM_ReadColumn/-1/203.488 GiB/sec4.062 GiB/sec  
  16.453   False
   1BM_ReadColumn/-1/102.740 GiB/sec3.055 GiB/sec  
  11.497   False
   11   BM_ReadColumn/-1/101.362 GiB/sec1.486 GiB/sec  
   9.086   False
   9BM_ReadColumn/-1/0   12.984 GiB/sec   13.310 GiB/sec  
   2.509   False
   13   BM_ReadColumn/5/10  515.781 MiB/sec  523.838 MiB/sec  
   1.562   False
   7 BM_ReadColumn/-1/14.505 GiB/sec4.559 GiB/sec  
   1.198   False
   34  BM_ReadColumn/1/20  400.254 MiB/sec  403.780 MiB/sec  
   0.881   False
   25  BM_ReadColumn/-1/0  870.109 MiB/sec  875.827 MiB/sec  
   0.657   False
   18BM_ReadColumn/-1/18.237 GiB/sec8.045 GiB/sec  
  -2.331   False
   27   BM_ReadColumn/-1/1  816.651 MiB/sec  793.763 MiB/sec  
  -2.803   False
   8BM_ReadColumn/10/501.649 GiB/sec1.587 GiB/sec  
  -3.792   False
   12 BM_ReadColumn/-1/03.044 GiB/sec1.432 GiB/sec  
 -52.958True
   33BM_ReadColumn/-1/03.161 GiB/sec1.479 GiB/sec  
 -53.209True
   4  BM_ReadColumn/-1/01.863 GiB/sec  754.619 MiB/sec  
 -60.448True
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-647694302


   Here's what I see with gcc-8:
   
   ```
   benchmark  baseline 
contender  change %  regression
   1   BM_ReadColumn/1/1 1.336 GiB/sec 1.880 
GiB/sec40.701   False
   17  BM_ReadColumn/1/1 2.224 GiB/sec 2.926 
GiB/sec31.567   False
   14 BM_ReadColumn/99/0 1.599 GiB/sec 2.001 
GiB/sec25.128   False
   23BM_ReadColumn/99/50 1.601 GiB/sec 1.990 
GiB/sec24.300   False
   21BM_ReadColumn/50/50  1002.325 MiB/sec 1.206 
GiB/sec23.171   False
   12BM_ReadColumn/50/50   619.919 MiB/sec   758.845 
MiB/sec22.410   False
   33BM_ReadColumn/45/25  1016.908 MiB/sec 1.212 
GiB/sec22.095   False
   30 BM_ReadColumn/50/0   620.068 MiB/sec   757.008 
MiB/sec22.085   False
   34  BM_ReadColumn/5/5 1.586 GiB/sec 1.935 
GiB/sec21.981   False
   27 BM_ReadColumn/99/0 2.539 GiB/sec 3.081 
GiB/sec21.308   False
   13 BM_ReadColumn/10/5 1.361 GiB/sec 1.644 
GiB/sec20.791   False
   18 BM_ReadColumn/50/1 1.002 GiB/sec 1.204 
GiB/sec20.174   False
   26 BM_ReadColumn/75/1 1.182 GiB/sec 1.419 
GiB/sec20.005   False
   15BM_ReadColumn/99/50 2.532 GiB/sec 3.036 
GiB/sec19.890   False
   11BM_ReadColumn/30/10 1.123 GiB/sec 1.322 
GiB/sec17.754   False
   35BM_ReadColumn/25/10 1.187 GiB/sec 1.396 
GiB/sec17.558   False
   25 BM_ReadColumn/25/5   741.612 MiB/sec   870.530 
MiB/sec17.384   False
   3 BM_ReadColumn/10/10   859.032 MiB/sec  1005.688 
MiB/sec17.072   False
   31BM_ReadColumn/35/10 1.095 GiB/sec 1.254 
GiB/sec14.528   False
   20   BM_ReadColumn/-1/1   712.571 MiB/sec   808.480 
MiB/sec13.460   False
   5BM_ReadColumn/25/25 1.244 GiB/sec 1.399 
GiB/sec12.397   False
   9BM_ReadColumn/5/10   460.741 MiB/sec   484.654 
MiB/sec 5.190   False
   7   BM_ReadColumn/1/20   397.103 MiB/sec   412.517 
MiB/sec 3.882   False
   29  BM_ReadColumn/-1/0   984.767 MiB/sec   994.483 
MiB/sec 0.987   False
   19   BM_ReadColumn/-1/013.609 GiB/sec13.738 
GiB/sec 0.944   False
   28BM_ReadColumn/-1/1 4.910 GiB/sec 4.702 
GiB/sec-4.236   False
   8 BM_ReadColumn/-1/1 8.631 GiB/sec 8.000 
GiB/sec-7.310True
   6BM_ReadColumn/-1/10 2.872 GiB/sec 2.329 
GiB/sec   -18.912True
   32   BM_ReadColumn/-1/10 1.457 GiB/sec 1.099 
GiB/sec   -24.615True
   22   BM_ReadColumn/10/50 1.433 GiB/sec   964.501 
MiB/sec   -34.267True
   24 BM_ReadColumn/-1/0 2.325 GiB/sec 1.400 
GiB/sec   -39.804True
   2 BM_ReadColumn/-1/0 2.412 GiB/sec 1.422 
GiB/sec   -41.059True
   10 BM_ReadColumn/-1/0 1.454 GiB/sec   864.019 
MiB/sec   -41.959True
   16  BM_ReadColumn/-1/20 3.826 GiB/sec 2.184 
GiB/sec   -42.899True
   0BM_ReadColumn/-1/50 4.669 GiB/sec 1.865 
GiB/sec   -60.055True
   4BM_ReadColumn/-1/50 3.184 GiB/sec   862.581 
MiB/sec   -73.546True
   ```
   
   I agree that the alternating NA/not-NA is pretty pathological so IMHO this 
performance regression doesn't seem like a big deal to me. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-647690792


   I rebased. I'm going to look at the benchmarks locally and then probably 
merge this so further performance work (including comparing with 
BitBlockCounter) can be pursued as follow up



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-647674585


   @emkornfield I'm sorry that I've been neglecting this PR. I will try to 
rebase this and investigate the perf questions a little bit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-05 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-639566354


   @emkornfield I can take a look at this once 
https://github.com/apache/arrow/pull/7356 is merged which adds a single-word 
function to `BitBlockCounter`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-04 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-639184070


   I could try to adapt the Parquet code to use BitBlockCounter and see what 
the benchmarks look like?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-04 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-638884684


   My anecdotal experience suggests that 1% is a higher-than-average rate of 
nulls, but I don't know how accurate that is. The popcount strategy could be 
adapted to do 64 bits at a time (instead of the 256 that's implemented now) 
which would mean a higher incidence of no-nulls blocks in the 1% case. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-03 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-638534173


   OK I adapted the benchmark here to use the `BitmapScanner` from #7346 
   
   https://github.com/wesm/arrow/tree/bit-runner
   
   ```
   --
   Benchmark   Time   CPU Iterations
   --
   BitRunReader/-1  9890 ns   9890 ns  70683   
49.3705MB/s
   BitRunReader/0108 ns108 ns6250693 
4.423GB/s
   BitRunReader/10  2101 ns   2101 ns 334686
232.36MB/s
   BitRunReader/25  4072 ns   4072 ns 173114   
119.915MB/s
   BitRunReader/50  5221 ns   5221 ns 133040   
93.5178MB/s
   BitRunReader/60  5042 ns   5042 ns 138099   
96.8386MB/s
   BitRunReader/75  3933 ns   3933 ns 179857   
124.152MB/s
   BitRunReader/99   291 ns291 ns2412105
1.6376GB/s
   BitRunReaderWithScanner/-1 47 ns 47 ns   15059881   
10.2331GB/s
   BitRunReaderWithScanner/0  46 ns 46 ns   15078363   
10.2704GB/s
   BitRunReaderWithScanner/10 47 ns 47 ns   15118172   
10.2299GB/s
   BitRunReaderWithScanner/25 47 ns 47 ns   15033144   
10.2528GB/s
   BitRunReaderWithScanner/50 47 ns 47 ns   14947443   
10.1964GB/s
   BitRunReaderWithScanner/60 47 ns 47 ns   14668505   
10.0837GB/s
   BitRunReaderWithScanner/75 46 ns 46 ns   15045334   
10.2918GB/s
   BitRunReaderWithScanner/99 46 ns 46 ns   14961067   
10.2813GB/s
   BitRunReaderScalar/-1   13089 ns  13088 ns  51449   
37.3063MB/s
   BitRunReaderScalar/0 3844 ns   3844 ns 176221   
127.024MB/s
   BitRunReaderScalar/106621 ns   6621 ns 104648   
73.7517MB/s
   BitRunReaderScalar/25   12397 ns  12397 ns  55998
39.388MB/s
   BitRunReaderScalar/50   17099 ns  17099 ns  41378
28.556MB/s
   BitRunReaderScalar/60   16606 ns  16606 ns  42580   
29.4046MB/s
   BitRunReaderScalar/75   11431 ns  11431 ns  61744   
42.7165MB/s
   BitRunReaderScalar/994265 ns   4265 ns 167402   
114.484MB/s
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-03 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-638530324


   From a brief look this is a lot more complicated than #7346. Does it have 
additional features? I can try to set up benchmarks to compare the two 
performance wise



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-03 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-638528872


   I need to take a closer look to compare the API of this versus what is in 
#7346, but for the specific use case in #7346, that code is going to be much 
faster. It would be nice to have some apples-to-apples benchmarks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-05-26 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-634107353







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-05-21 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-632463807


   Cool, I will try to look at this tomorrow and kick the tires a bit with 
benchmarks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-05-10 Thread GitBox


wesm commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-626408844


   A fair chunk of RLE-related code came out of Impala originally, it might not 
be a bad idea to peek at what's in apache/impala to see if it has gotten worked 
on perf-wise since the beginning of 2016. In any case, we wouldn't want to take 
on a change that causes a perf regression in cases with many short 
null/non-null runs



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org