[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645600060 +1. Thanks all for the comments This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645569875 @ursabot benchmark --benchmark-filter=Filter 04006ff This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645556193 So these "readability" improvements made performance worse so I'll revert them This is an automated message from the

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645526968 @ursabot benchmark --benchmark-filter=Filter 04006ff This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645521577 Something weird with the commit history, I'm not sure those benchmarks are right. I'll rebase things again and rerun

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645498297 I think I improved some of the readability problems and addressed the other comments. I'd like to merge this soon once CI is creen --

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-17 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645497918 @ursabot benchmark --benchmark-filter=Filter c4f425768 This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-645004792 I'll have to deal with the string optimization in a follow up PR, so I'm going to leave this for review as is. It would be good to get this merged sooner rather than later

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644920072 True. I think for binary-based types we need to implement bulk-block-appends. It's beyond the scope of this PR -- I will take a brief look to see if there's anything dumb (like messi

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644913406 The string perf regressions are mostly for the cases where 99.9% of the values are selected. I'll take a closer look at this to see what can be done. The varbinary case is so importa

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644892503 @ursabot benchmark --benchmark-filter=Filter 66df3d0 This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644892130 @buildbot benchmark --help This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644881357 I found some issues in the Python benchmarks I posted before. Here's the updated setup and current numbers setup (I was including the cost of converting NumPy booleans to Arrow

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644870737 I implemented some other optimizations, especially for the case where neither values nor filter contain nulls. I'm working on updated benchmarks

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644742275 The RTools 4.0 build is spurious. This is ready for review This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644513681 To show some simple numbers to show the perf before and after in Python, this example has a high selectivity (all but one value selected) and low selectivity filter (only 1% of value

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644509797 Here's benchmark runs on my machine * BEFORE: https://gist.github.com/wesm/857a3179e7dbc928d3325b1e7f687086 * AFTER: https://gist.github.com/wesm/ad07cec1613b6327926dfe1d95e7