[GitHub] [arrow] houqp commented on pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-16 Thread GitBox


houqp commented on pull request #7466:
URL: https://github.com/apache/arrow/pull/7466#issuecomment-645187978


   Since this PR added a test that creates mutable execution context, it needs 
to be rebased and updated after https://github.com/apache/arrow/pull/7464 get 
merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7466:
URL: https://github.com/apache/arrow/pull/7466#issuecomment-645185474


   https://issues.apache.org/jira/browse/ARROW-9158



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp opened a new pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-16 Thread GitBox


houqp opened a new pull request #7466:
URL: https://github.com/apache/arrow/pull/7466


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7465:
URL: https://github.com/apache/arrow/pull/7465#issuecomment-645152229


   https://issues.apache.org/jira/browse/PARQUET-1877



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield edited a comment on pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox


emkornfield edited a comment on pull request #7465:
URL: https://github.com/apache/arrow/pull/7465#issuecomment-645150494


   CC @wesm @pitrou I would assume 1MM elements is still sufficient for any 
reasonable parquet file, but let me know if you think differently.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox


emkornfield commented on pull request #7465:
URL: https://github.com/apache/arrow/pull/7465#issuecomment-645150494


   CC @wesm @pitrou 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield opened a new pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox


emkornfield opened a new pull request #7465:
URL: https://github.com/apache/arrow/pull/7465


   Sets container size limit to have an upper bound memory footprint at the 
same order of magnitude as string size limits.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7464: ARROW-9157: [Rust][Datafusion] create_physical_plan should take self as immutable reference

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7464:
URL: https://github.com/apache/arrow/pull/7464#issuecomment-645143002


   https://issues.apache.org/jira/browse/ARROW-9157



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] dhirschfeld commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


dhirschfeld commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645141127


   > Not to beat a dead horse about ARROW-9155
   
   The bot is fine - I guess it links to whatever JIRA is listed in the title. 
That doesn't help if someone mentions a JIRA in a comment. If the autolink was 
configured for this repo that reference would have *automatically* been 
converted to [`ARROW-9155`](https://issues.apache.org/jira/browse/ARROW-9155).
   
   It's just a *very* minor thing, but it does help the DX.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp opened a new pull request #7464: ARROW-9157: [Rust][Datafusion] create_physical_plan should take self as immutable reference

2020-06-16 Thread GitBox


houqp opened a new pull request #7464:
URL: https://github.com/apache/arrow/pull/7464


   Since it's not mutating self, mutable reference is not necessary.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on pull request #7314: ARROW-8996: [C++] runtime support for aggregate sum dense kernel

2020-06-16 Thread GitBox


ursabot commented on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-645135876


   [AMD64 Ubuntu 18.04 C++ Benchmark 
(#112762)](https://ci.ursalabs.org/#builders/73/builds/79) builder has been 
succeeded.
   
   Revision: 525caea882fe49c0248932fff77df6bcd3f2f477
   
   ```diff
 ===  =  ===  
 benchmarkbaseline   contenderchange
 ===  =  ===  
 SumKernelFloat/32768/1   5.958 GiB/sec  10.208 GiB/sec   71.337%
 SumKernelInt64/32768/1   7.938 GiB/sec  10.783 GiB/sec   35.851%
 SumKernelInt32/32768/0   4.958 GiB/sec  10.832 GiB/sec   118.486%
 SumKernelInt64/32768/0   9.704 GiB/sec  14.764 GiB/sec   52.146%
 SumKernelFloat/32768/2   2.389 GiB/sec  5.767 GiB/sec141.401%
   - SumKernelInt16/32768/2   2.953 GiB/sec  1.696 GiB/sec-42.565%
 SumKernelDouble/32768/0  9.462 GiB/sec  13.540 GiB/sec   43.102%
 SumKernelInt32/32768/2   5.074 GiB/sec  6.403 GiB/sec26.201%
   - SumKernelInt16/32768/1   4.520 GiB/sec  1.693 GiB/sec-62.552%
 SumKernelFloat/32768/100 4.644 GiB/sec  5.807 GiB/sec25.051%
 SumKernelInt64/32768/1   9.778 GiB/sec  14.784 GiB/sec   51.203%
 SumKernelDouble/32768/1  7.595 GiB/sec  10.131 GiB/sec   33.400%
 SumKernelFloat/32768/1   3.920 GiB/sec  5.938 GiB/sec51.490%
 SumKernelInt64/32768/10  8.461 GiB/sec  10.548 GiB/sec   24.655%
   - SumKernelInt8/32768/10   1.680 GiB/sec  674.552 MiB/sec  -60.783%
 SumKernelDouble/32768/10 7.748 GiB/sec  9.813 GiB/sec26.645%
   - SumKernelInt8/32768/11.643 GiB/sec  678.806 MiB/sec  -59.643%
 SumKernelInt8/32768/01.745 GiB/sec  4.709 GiB/sec169.817%
 SumKernelInt32/32768/1   4.959 GiB/sec  10.884 GiB/sec   119.497%
 SumKernelInt64/32768/100 9.294 GiB/sec  10.545 GiB/sec   13.466%
   - SumKernelInt8/32768/100  2.242 GiB/sec  674.145 MiB/sec  -70.638%
 SumKernelInt64/32768/2   7.674 GiB/sec  10.553 GiB/sec   37.529%
 SumKernelInt16/32768/0   2.850 GiB/sec  7.599 GiB/sec166.670%
 SumKernelInt32/32768/10  5.754 GiB/sec  6.423 GiB/sec11.627%
 SumKernelInt32/32768/1   5.213 GiB/sec  6.577 GiB/sec26.172%
 SumKernelDouble/32768/1008.731 GiB/sec  9.829 GiB/sec12.578%
 SumKernelFloat/32768/0   5.960 GiB/sec  10.156 GiB/sec   70.394%
   - SumKernelInt16/32768/100 4.099 GiB/sec  1.694 GiB/sec-58.674%
   - SumKernelInt8/32768/12.580 GiB/sec  674.639 MiB/sec  -74.465%
 SumKernelDouble/32768/2  6.864 GiB/sec  9.912 GiB/sec44.416%
 SumKernelInt32/32768/100 6.731 GiB/sec  6.439 GiB/sec-4.334%
   - SumKernelInt8/32768/21.604 GiB/sec  674.265 MiB/sec  -58.959%
   - SumKernelInt16/32768/10  3.491 GiB/sec  1.694 GiB/sec-51.473%
 SumKernelFloat/32768/10  3.924 GiB/sec  5.794 GiB/sec47.673%
 SumKernelDouble/32768/1  9.531 GiB/sec  13.404 GiB/sec   40.636%
   - SumKernelInt16/32768/1   3.021 GiB/sec  1.710 GiB/sec-43.392%
 ===  =  ===  
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] runtime support for aggregate sum dense kernel

2020-06-16 Thread GitBox


jianxind commented on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-645130206


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645129920


   Well my theory about greater/less didn't hold. The other relevant change was 
moving things into the anonymous namespace. It's possible that anonymous 
namespaces impact inlining somehow 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


ursabot commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645120596


   [AMD64 Ubuntu 18.04 C++ Benchmark 
(#112729)](https://ci.ursalabs.org/#builders/73/builds/78) builder has been 
succeeded.
   
   Revision: 74caaae25e3bd95d57f3f6d9b835c2610639ab41
   
   ```diff
   ==  
==  
 benchmark baselinecontender
   change
   ==  
==  
 GreaterArrayScalarString/32768/2  130.267m items/sec  224.418m 
items/sec  72.276%
 GreaterArrayScalarString/32768/10 261.021m items/sec  498.491m 
items/sec  90.977%
   - GreaterArrayArrayInt64/32768/2361.493m items/sec  335.383m 
items/sec  -7.223%
 GreaterArrayScalarInt64/32768/10  565.966m items/sec  690.931m 
items/sec  22.080%
 GreaterArrayScalarInt64/32768/100 555.331m items/sec  668.238m 
items/sec  20.331%
   - GreaterArrayArrayInt64/32768/1364.755m items/sec  338.493m 
items/sec  -7.200%
   - GreaterArrayArrayInt64/32768/1362.806m items/sec  336.071m 
items/sec  -7.369%
 GreaterArrayScalarInt64/32768/1   573.242m items/sec  704.449m 
items/sec  22.889%
 GreaterArrayArrayString/32768/1   54.921m items/sec   73.848m 
items/sec   34.463%
 GreaterArrayArrayString/32768/2   52.950m items/sec   92.643m 
items/sec   74.964%
 GreaterArrayScalarString/32768/1  361.437m items/sec  653.553m 
items/sec  80.821%
 GreaterArrayScalarInt64/32768/0   555.044m items/sec  701.556m 
items/sec  26.396%
 GreaterArrayScalarInt64/32768/1   575.762m items/sec  682.164m 
items/sec  18.480%
   - GreaterArrayArrayInt64/32768/100  361.216m items/sec  338.015m 
items/sec  -6.423%
 GreaterArrayScalarString/32768/0  339.144m items/sec  707.998m 
items/sec  108.760%
 GreaterArrayArrayString/32768/1   127.597m items/sec  193.087m 
items/sec  51.326%
 GreaterArrayArrayString/32768/10  50.585m items/sec   69.612m 
items/sec   37.613%
 GreaterArrayArrayString/32768/100 54.563m items/sec   73.054m 
items/sec   33.889%
 GreaterArrayScalarInt64/32768/2   558.941m items/sec  697.482m 
items/sec  24.786%
 GreaterArrayScalarString/32768/100331.214m items/sec  669.210m 
items/sec  102.048%
   - GreaterArrayArrayInt64/32768/10   361.791m items/sec  336.834m 
items/sec  -6.898%
 GreaterArrayArrayString/32768/0   54.614m items/sec   73.456m 
items/sec   34.499%
 GreaterArrayScalarString/32768/1  342.276m items/sec  705.190m 
items/sec  106.030%
   - GreaterArrayArrayInt64/32768/0363.455m items/sec  336.738m 
items/sec  -7.351%
   ==  
==  
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7463: ARROW-9145: [C++] Implement BooleanArray::true_count and false_count, add Python bindings

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7463:
URL: https://github.com/apache/arrow/pull/7463#issuecomment-645116913


   https://issues.apache.org/jira/browse/ARROW-9145



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645116862


   Well we have these bot comments, is it not sufficient? 
https://github.com/apache/arrow/pull/7461#issuecomment-645086851



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7462: ARROW-7068: [C++] Add ListArray::offsets and LargeListArray::offsets returning boxed version of offsets as Int32Array/Int64Array

2020-06-16 Thread GitBox


wesm commented on pull request #7462:
URL: https://github.com/apache/arrow/pull/7462#issuecomment-645116463


   I'm a bit stumped on the MinGW failure
   
   ```
   [ 64%] Linking CXX executable ../../release/arrow-array-test.exe
   
C:/msys64/mingw32/bin/../lib/gcc/i686-w64-mingw32/10.1.0/../../../../i686-w64-mingw32/bin/ld.exe:
 
CMakeFiles/arrow-array-test.dir/objects.a(unity_0_cxx.cxx.obj):unity_0_cxx.cxx:(.text$_ZN5arrow13TestListArrayINS_8ListTypeEE22ValidateBasicListArrayEPKNS_9ListArrayERKSt6vectorIsSaIsEERKS6_IhSaIhEE[__ZN5arrow13TestListArrayINS_8ListTypeEE22ValidateBasicListArrayEPKNS_9ListArrayERKSt6vectorIsSaIsEERKS6_IhSaIhEE]+0x803):
 undefined reference to 
`_imp___ZN5arrow6detail9CTypeImplINS_9Int32TypeENS_11IntegerTypeELNS_4Type4typeE7EiE7type_idE'
   
C:/msys64/mingw32/bin/../lib/gcc/i686-w64-mingw32/10.1.0/../../../../i686-w64-mingw32/bin/ld.exe:
 
CMakeFiles/arrow-array-test.dir/objects.a(unity_0_cxx.cxx.obj):unity_0_cxx.cxx:(.text$_ZN5arrow13TestListArrayINS_13LargeListTypeEE22ValidateBasicListArrayEPKNS_14LargeListArrayERKSt6vectorIsSaIsEERKS6_IhSaIhEE[__ZN5arrow13TestListArrayINS_13LargeListTypeEE22ValidateBasicListArrayEPKNS_14LargeListArrayERKSt6vectorIsSaIsEERKS6_IhSaIhEE]+0x83c):
 undefined reference to 
`_imp___ZN5arrow6detail9CTypeImplINS_9Int64TypeENS_11IntegerTypeELNS_4Type4typeE9ExE7type_idE'
   collect2.exe: error: ld returned 1 exit status
   make[2]: *** [src/arrow/CMakeFiles/arrow-array-test.dir/build.make:118: 
release/arrow-array-test.exe] Error 1
   make[1]: *** [CMakeFiles/Makefile2:1680: 
src/arrow/CMakeFiles/arrow-array-test.dir/all] Error 2
   make[1]: *** Waiting for unfinished jobs
   ```
   
   I'm reimplementing the methods in a way that hopefully won't trigger this 
mess.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] dhirschfeld commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


dhirschfeld commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645115775


   To help follow along it would be handy if references to JIRA could be 
autolinked:
   
https://help.github.com/en/github/administering-a-repository/configuring-autolinks-to-reference-external-resources
   
   That would save the lurker from having to search for it or the poster having 
to bother about pasting an actual URL.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645113493


   Not to beat a dead horse about ARROW-9155, but the turnaround time for 
simple benchmarks isn't great



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645113285


   @ursabot benchmark --benchmark-filter=Greater 18e559b



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645111941


   Ah! It's because Greater is not implemented using Less. Let me switch things 
around



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm edited a comment on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645111941


   Ah! It's because Greater is now implemented using Less. Let me switch things 
around so things are based on Greater/GreaterEqual instead



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


ursabot commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645111207


   [AMD64 Ubuntu 18.04 C++ Benchmark 
(#112703)](https://ci.ursalabs.org/#builders/73/builds/77) builder has been 
succeeded.
   
   Revision: 301ffa539e634f2c464ca072cd5c543f1407f1f7
   
   ```diff
   ==  
==  
 benchmark baselinecontender
   change
   ==  
==  
 GreaterArrayArrayString/32768/2   53.694m items/sec   93.892m 
items/sec   74.864%
 GreaterArrayScalarInt64/32768/2   616.025m items/sec  681.971m 
items/sec  10.705%
 GreaterArrayScalarString/32768/2  132.267m items/sec  216.365m 
items/sec  63.583%
   - GreaterArrayArrayInt64/32768/1365.572m items/sec  339.713m 
items/sec  -7.074%
 GreaterArrayArrayString/32768/100 53.635m items/sec   74.222m 
items/sec   38.384%
   - GreaterArrayArrayInt64/32768/100  365.263m items/sec  339.865m 
items/sec  -6.953%
 GreaterArrayScalarString/32768/1  341.705m items/sec  547.337m 
items/sec  60.178%
   - GreaterArrayArrayInt64/32768/1365.608m items/sec  340.598m 
items/sec  -6.841%
 GreaterArrayArrayString/32768/0   55.046m items/sec   74.164m 
items/sec   34.733%
 GreaterArrayScalarInt64/32768/1   551.861m items/sec  679.554m 
items/sec  23.139%
 GreaterArrayScalarString/32768/1  360.301m items/sec  724.313m 
items/sec  101.030%
 GreaterArrayScalarInt64/32768/10  618.148m items/sec  669.015m 
items/sec  8.229%
   - GreaterArrayArrayInt64/32768/2365.552m items/sec  340.107m 
items/sec  -6.961%
 GreaterArrayScalarString/32768/100332.135m items/sec  529.815m 
items/sec  59.518%
   - GreaterArrayArrayInt64/32768/0367.549m items/sec  341.912m 
items/sec  -6.975%
   - GreaterArrayArrayInt64/32768/10   365.734m items/sec  339.651m 
items/sec  -7.131%
 GreaterArrayScalarInt64/32768/100 604.015m items/sec  667.332m 
items/sec  10.483%
 GreaterArrayScalarString/32768/10 258.597m items/sec  411.304m 
items/sec  59.052%
 GreaterArrayScalarInt64/32768/0   622.436m items/sec  671.837m 
items/sec  7.937%
 GreaterArrayArrayString/32768/1   130.420m items/sec  196.548m 
items/sec  50.704%
 GreaterArrayScalarInt64/32768/1   622.602m items/sec  699.200m 
items/sec  12.303%
 GreaterArrayArrayString/32768/10  51.206m items/sec   70.119m 
items/sec   36.935%
 GreaterArrayScalarString/32768/0  337.556m items/sec  549.035m 
items/sec  62.650%
 GreaterArrayArrayString/32768/1   54.218m items/sec   74.885m 
items/sec   38.117%
   ==  
==  
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7463: ARROW-9145: [C++] Implement BooleanArray::true_count and false_count, add Python bindings

2020-06-16 Thread GitBox


wesm commented on pull request #7463:
URL: https://github.com/apache/arrow/pull/7463#issuecomment-645110679


   FWIW `BooleanArray::true_count()` should be what's used for the 
`sum(boolean)` kernel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7463: ARROW-9145: [C++] Implement BooleanArray::true_count and false_count, add Python bindings

2020-06-16 Thread GitBox


wesm opened a new pull request #7463:
URL: https://github.com/apache/arrow/pull/7463


   This seemed like a reasonable place to put this, and it seems like it may 
come in handy. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#discussion_r441240680



##
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##
@@ -121,18 +123,34 @@ struct ArrayIterator> {
 
 template 
 struct ArrayIterator> {
-  int64_t position = 0;
-  typename TypeTraits::ArrayType arr;
-  explicit ArrayIterator(const ArrayData& data) : arr(data.Copy()) {}
-  util::string_view operator()() { return arr.GetView(position++); }
+  using offset_type = typename Type::offset_type;
+  const ArrayData& arr;
+  const offset_type* offsets;
+  offset_type cur_offset;
+  const char* data;
+  int64_t position;
+  explicit ArrayIterator(const ArrayData& arr)
+  : arr(arr),
+offsets(reinterpret_cast(arr.buffers[1]->data()) +
+arr.offset),
+cur_offset(offsets[0]),
+data(reinterpret_cast(arr.buffers[2]->data())),
+position(0) {}
+
+  util::string_view operator()() {
+offset_type next_offset = offsets[position++ + 1];
+auto result = util::string_view(data + cur_offset, next_offset - 
cur_offset);
+cur_offset = next_offset;
+return result;
+  }

Review comment:
   @pitrou anecdotally it appears this is meaningfully faster than the 
prior version which used `GetView` 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645104527


   @ursabot benchmark --benchmark-filter=Greater 18e559b



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #7287: ARROW-8771: [C++] Add boost/process library to build support

2020-06-16 Thread GitBox


liyafan82 commented on pull request #7287:
URL: https://github.com/apache/arrow/pull/7287#issuecomment-645103962


   > @liyafan82 I rebuilt the boost bundle and uploaded to bintray. Can you 
re-run whichever tests you have that failed because of this before and see if 
they work now? If they're good now, then we can merge this.
   
   @nealrichardson Thanks for your effort, I will re-run the build today. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645102793


   There weren't any C++ unit tests for comparisons of primitive types so I 
addressed that, and also added comparisons for Time and Duration types (which 
on account of this patch are basically "free")



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7462: ARROW-7068: [C++] Add ListArray::offsets and LargeListArray::offsets returning boxed version of offsets as Int32Array/Int64Array

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7462:
URL: https://github.com/apache/arrow/pull/7462#issuecomment-645098523


   https://issues.apache.org/jira/browse/ARROW-7068



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7462: ARROW-7068: [C++] Add ListArray::offsets and LargeListArray::offsets returning boxed version of offsets as Int32Array/Int64Array

2020-06-16 Thread GitBox


wesm opened a new pull request #7462:
URL: https://github.com/apache/arrow/pull/7462


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


ursabot commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645087761


   [AMD64 Ubuntu 18.04 C++ Benchmark 
(#112653)](https://ci.ursalabs.org/#builders/73/builds/76) builder has been 
succeeded.
   
   Revision: 53671af32c338fcca1edc40732c4c5fd1ad7585e
   
   ```diff
   ==  
==  
 benchmark baselinecontender
   change
   ==  
==  
 GreaterArrayArrayString/32768/2   53.657m items/sec   94.257m 
items/sec   75.665%
   - GreaterArrayArrayInt64/32768/2363.591m items/sec  337.022m 
items/sec  -7.307%
 GreaterArrayScalarInt64/32768/100 598.659m items/sec  701.841m 
items/sec  17.236%
 GreaterArrayScalarString/32768/0  341.707m items/sec  463.243m 
items/sec  35.567%
 GreaterArrayArrayString/32768/100 54.363m items/sec   75.156m 
items/sec   38.249%
   - GreaterArrayArrayInt64/32768/1364.615m items/sec  337.861m 
items/sec  -7.338%
   - GreaterArrayArrayInt64/32768/0365.779m items/sec  338.697m 
items/sec  -7.404%
 GreaterArrayArrayString/32768/1   54.734m items/sec   75.788m 
items/sec   38.465%
 GreaterArrayScalarString/32768/1  364.271m items/sec  569.761m 
items/sec  56.411%
 GreaterArrayScalarInt64/32768/2   596.674m items/sec  681.189m 
items/sec  14.164%
   - GreaterArrayArrayInt64/32768/1364.433m items/sec  338.039m 
items/sec  -7.242%
 GreaterArrayArrayString/32768/1   127.275m items/sec  195.144m 
items/sec  53.325%
 GreaterArrayScalarInt64/32768/10  614.042m items/sec  697.835m 
items/sec  13.646%
 GreaterArrayScalarInt64/32768/1   613.585m items/sec  692.598m 
items/sec  12.877%
   - GreaterArrayArrayInt64/32768/100  360.385m items/sec  337.343m 
items/sec  -6.394%
 GreaterArrayScalarString/32768/100331.392m items/sec  447.519m 
items/sec  35.042%
 GreaterArrayScalarInt64/32768/0   624.860m items/sec  705.244m 
items/sec  12.864%
   - GreaterArrayArrayInt64/32768/10   363.073m items/sec  337.156m 
items/sec  -7.138%
 GreaterArrayScalarInt64/32768/1   608.171m items/sec  678.010m 
items/sec  11.483%
 GreaterArrayScalarString/32768/10 256.765m items/sec  352.403m 
items/sec  37.247%
 GreaterArrayScalarString/32768/1  335.759m items/sec  461.788m 
items/sec  37.535%
 GreaterArrayArrayString/32768/0   54.130m items/sec   75.873m 
items/sec   40.168%
 GreaterArrayScalarString/32768/2  130.823m items/sec  194.535m 
items/sec  48.701%
 GreaterArrayArrayString/32768/10  51.190m items/sec   71.220m 
items/sec   39.127%
   ==  
==  
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645086851


   https://issues.apache.org/jira/browse/ARROW-8969



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-16 Thread GitBox


kou closed pull request #7430:
URL: https://github.com/apache/arrow/pull/7430


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-16 Thread GitBox


kou commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-645084134


   > Done.
   
   Thanks!
   Now, I'm a member of https://bintray.com/ursalabs .
   (Please ignore my meaningless join request...)
   
   > We could I guess, though I don't think I have sufficient permissions on 
the org to do that.
   
   You can join https://bintray.com/apache/arrow/ by requesting it to INFRA 
like https://issues.apache.org/jira/browse/INFRA-18698 .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645083255


   @ursabot benchmark --benchmark-filter=Greater 18e559b



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645083142


   @ursabot benchmark --benchmark_filter=Greater 18e559b



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


ursabot commented on pull request #7461:
URL: https://github.com/apache/arrow/pull/7461#issuecomment-645083146


   ```
   no such option: --benchmark_filter
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox


wesm opened a new pull request #7461:
URL: https://github.com/apache/arrow/pull/7461


   With clang-8 on Linux this unit is now 654KB down from 1257KB. 
   
   A few strategies:
   
   * Use same binary code for Less/Greater and LessEqual/GreaterEqual with 
arguments flipped
   * Reuse kernels for primitive types represented as integers
   * Reuse kernels for binary/string and largebinary/largestring
   
   On the latter matter, I had to rewrite some of the codegen_internal.h 
routines where they assumed they would receive the specific logical type as the 
compile-time parameter (possibly causing a conflict when trying to put a 
BinaryArray box around a StringArray's ArrayData). This makes kernel execution 
against string data faster in the places where `ArrayIterator` is used. Will 
run benchmarks
   
   As an aside, in general these kernel code generators will likely have lower 
and lower level code over time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7460: ARROW-9154: [Developer] Use GitHub issue templates better

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7460:
URL: https://github.com/apache/arrow/pull/7460#issuecomment-645058491


   https://issues.apache.org/jira/browse/ARROW-9154



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7460: ARROW-9154: [Developer] Use GitHub issue templates better

2020-06-16 Thread GitBox


nealrichardson opened a new pull request #7460:
URL: https://github.com/apache/arrow/pull/7460


   To check it out on my fork, go to 
https://github.com/nealrichardson/arrow/issues and click New Issue



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7459: ARROW-6800: [C++] Support building libraries targeting C++14 or higher, disable GNU CXX extensions

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7459:
URL: https://github.com/apache/arrow/pull/7459#issuecomment-645049497


   https://issues.apache.org/jira/browse/ARROW-6800



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7459: ARROW-6800: [C++] Support building libraries targeting C++14 or higher, disable GNU CXX extensions

2020-06-16 Thread GitBox


wesm commented on pull request #7459:
URL: https://github.com/apache/arrow/pull/7459#issuecomment-645049112


   OK I guess I'm a bit out of my depth on the CMAKE_CXX_EXTENSIONS issue. If 
anyone has ideas about what to do let me know



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7459: ARROW-6800: [C++] Support building libraries targeting C++14 or higher, disable GNU CXX extensions

2020-06-16 Thread GitBox


wesm commented on pull request #7459:
URL: https://github.com/apache/arrow/pull/7459#issuecomment-645047309


   Well this is weird
   
   ```
   CMake Error in src/plasma/CMakeLists.txt:
 Target "plasma-external-store-tests" requires the language dialect "CXXOFF"
 , but CMake does not know the compile flags to use to enable it.
   CMake Error in src/plasma/CMakeLists.txt:
 Target "plasma-client-tests" requires the language dialect "CXXOFF" , but
 CMake does not know the compile flags to use to enable it.
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7459: ARROW-6800: [C++] Support building libraries targeting C++14 or higher, disable GNU CXX extensions

2020-06-16 Thread GitBox


wesm commented on pull request #7459:
URL: https://github.com/apache/arrow/pull/7459#issuecomment-645047223


   @kou @brills @kszucs could use your advice on possible concerns regarding 
the -std=gnu++11 to -std=c++11 change when using gcc



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7459: ARROW-6800: [C++] Support building libraries targeting C++14 or higher, disable GNU CXX extensions

2020-06-16 Thread GitBox


wesm opened a new pull request #7459:
URL: https://github.com/apache/arrow/pull/7459


   This seemed pretty simple. The C++ standard targeted by default is still 
C++11 but some users might want to target a higher standard (e.g. to get C++17 
`std::string_view`). I also disabled use of `-std=gnu++11`. Not sure what are 
the implications of this but the test suite seemed fine locally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7455: ARROW-9031: [R] Implement conversion from Type::UINT64 to R vector

2020-06-16 Thread GitBox


nealrichardson closed pull request #7455:
URL: https://github.com/apache/arrow/pull/7455


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7455: ARROW-9031: [R] Implement conversion from Type::UINT64 to R vector

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7455:
URL: https://github.com/apache/arrow/pull/7455#issuecomment-645045336


   Deferring to https://jira.apache.org/jira/browse/ARROW-9083 questions of 
whether there are better ways to convert. This at least no longer errors and 
gives something usable.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson edited a comment on pull request #7287: ARROW-8771: [C++] Add boost/process library to build support

2020-06-16 Thread GitBox


nealrichardson edited a comment on pull request #7287:
URL: https://github.com/apache/arrow/pull/7287#issuecomment-645042129


   @liyafan82 I rebuilt the boost bundle and uploaded to bintray. Can you 
re-run whichever tests you have that failed because of this before and see if 
they work now? If they're good now, then we can merge this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7287: ARROW-8771: [C++] Add boost/process library to build support

2020-06-16 Thread GitBox


nealrichardson commented on a change in pull request #7287:
URL: https://github.com/apache/arrow/pull/7287#discussion_r441176530



##
File path: cpp/build-support/trim-boost.sh
##
@@ -32,12 +32,12 @@ set -eu
 : 
${BOOST_URL:=https://dl.bintray.com/boostorg/release/${BOOST_VERSION}/source/${BOOST_FILE}.tar.gz}
 
 # Arrow tests require these
-BOOST_LIBS="system.hpp filesystem.hpp"
+BOOST_LIBS="process.hpp filesystem.hpp"

Review comment:
   I wasn't sure if it was safe to remove system.hpp because it is named 
explicitly in cmake and don't have time to track this down right now, so 
leaving a note to future self.
   
   ```suggestion
   # TODO: is system still required? We declare 
`--with-libraries=filesystem,regex,system`
   # in cmake, but there is no #include  in our source anymore
   BOOST_LIBS="system.hpp process.hpp filesystem.hpp"
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7287: ARROW-8771: [C++] Add boost/process library to build support

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7287:
URL: https://github.com/apache/arrow/pull/7287#issuecomment-645042129


   @liyafan82 I rebuilt the boost bundle and uploaded to bintray. Can you 
re-run whichever tests you have that failed because of this before and see if 
they work now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox


kou commented on a change in pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#discussion_r441174856



##
File path: python/manylinux1/scripts/build_boost.sh
##
@@ -16,12 +16,12 @@
 # specific language governing permissions and limitations
 # under the License.
 
-BOOST_VERSION=1.68.0
+BOOST_VERSION=1.73.0
 BOOST_VERSION_UNDERSCORE=${BOOST_VERSION//\./_}
 NCORES=$(($(grep -c ^processor /proc/cpuinfo) + 1))
 
-curl -sL 
https://sourceforge.net/projects/boost/files/boost/${BOOST_VERSION}/boost_${BOOST_VERSION_UNDERSCORE}.tar.gz
 -o /boost_${BOOST_VERSION_UNDERSCORE}.tar.gz
-tar xf boost_${BOOST_VERSION_UNDERSCORE}.tar.gz
+curl -sL 
https://dl.bintray.com/boostorg/release/${BOOST_VERSION}/source/boost_${BOOST_VERSION_UNDERSCORE}.tar.bz2
 -o /boost_${BOOST_VERSION_UNDERSCORE}.tar.bz2

Review comment:
   The Bintray limitation is bandwidth: 
https://github.com/boostorg/boost/issues/299#issuecomment-605462600
   
   > The problem is not the reliability of the servers. The problem is that we 
have a large, but limited amount of (donated) bandwidth (about 30 TB/month). 
Someone could set up a set of mirrors, but who is going to provide (pay for) 
the service?
   
   It may be better that we mirror Boost at our Bintray and use our mirror.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-645041382


   Uploaded and re-triggered the failing build for confirmation that it's 
working now. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-645041249


   @github-actions crossbow submit test-r-rstudio-r-base-3.6-bionic



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7458: ARROW-9122: [C++] Properly handle sliced arrays in ascii_lower, ascii_upper kernels

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7458:
URL: https://github.com/apache/arrow/pull/7458#issuecomment-645039862


   https://issues.apache.org/jira/browse/ARROW-9122



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7445: ARROW-8583: [C++][Doc] Undocumented parameter in Dataset namespace

2020-06-16 Thread GitBox


wesm closed pull request #7445:
URL: https://github.com/apache/arrow/pull/7445


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7458: ARROW-9122: [C++] Properly handle sliced arrays in ascii_lower, ascii_upper kernels

2020-06-16 Thread GitBox


wesm opened a new pull request #7458:
URL: https://github.com/apache/arrow/pull/7458


   We must both take into account the referenced range of the data buffer and 
shift the value offsets if the array slice offset is nonzero. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow

2020-06-16 Thread GitBox


wesm edited a comment on pull request #7315:
URL: https://github.com/apache/arrow/pull/7315#issuecomment-645038858


   Will this approach work on Windows? This doesn't look like it's going to 
work for all our dependencies that might be built in bundled mode. I still 
think the static lib splicing with `ar` (and equivalents on macOS and Windows) 
is the most viable solution. Do others have opinions?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow

2020-06-16 Thread GitBox


wesm commented on pull request #7315:
URL: https://github.com/apache/arrow/pull/7315#issuecomment-645038858


   Will this approach work on Windows? This doesn't look like it's going to 
work for all our dependencies that might be built in bundled mode. I still 
think the static lib splicing with AR is the most viable solution. Do others 
have opinions?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-16 Thread GitBox


wesm closed pull request #7435:
URL: https://github.com/apache/arrow/pull/7435


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-16 Thread GitBox


wesm commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-645037889


   thanks @romainfrancois!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r441170772



##
File path: r/src/array_from_vector.cpp
##
@@ -201,6 +202,67 @@ struct VectorToArrayConverter {
 return Status::OK();
   }
 
+  template 
+  arrow::enable_if_t::value, Status> Visit(const T& type) {
+using BuilderType = typename TypeTraits::BuilderType;
+ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+Status::RError("Expecting a data frame"));
+
+auto* struct_builder = checked_cast(builder);
+
+int64_t n = vctrs::short_vec_size(x);

Review comment:
   It seems in R it's possible to have data-frame-like tibble columns that 
are malformed viewed as Arrow-like (as evidenced by the unit test)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


wesm closed pull request #7410:
URL: https://github.com/apache/arrow/pull/7410


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


wesm commented on pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#issuecomment-645034827


   Merging



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #6512: ARROW-8430: [CI] Configure self-hosted runners for Github Actions

2020-06-16 Thread GitBox


kszucs commented on pull request #6512:
URL: https://github.com/apache/arrow/pull/6512#issuecomment-645027704


   Tried but it didn’t work. I can further investigate it, but passing “-j” 
fixed the resource issues for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #6512: ARROW-8430: [CI] Configure self-hosted runners for Github Actions

2020-06-16 Thread GitBox


kou commented on a change in pull request #6512:
URL: https://github.com/apache/arrow/pull/6512#discussion_r441156336



##
File path: ci/docker/ubuntu-14.04-cpp.dockerfile
##
@@ -68,7 +68,7 @@ ENV ARROW_BUILD_TESTS=ON \
 ARROW_WITH_BROTLI=ON \
 ARROW_WITH_BZ2=ON \
 ARROW_WITH_LZ4=ON \
-ARROW_WITH_OPENSSL=OFF \
+PARQUET_REQUIRE_ENCRYPTION=OFF \

Review comment:
   Could you sort in alphabetical order?

##
File path: ci/scripts/cpp_build.sh
##
@@ -127,7 +127,11 @@ cmake -G "${CMAKE_GENERATOR:-Ninja}" \
   ${CMAKE_ARGS} \
   ${source_dir}
 
-time cmake --build . --target install
+if [ ! -z "${CPP_MAKE_PARALLELISM}" ]; then
+  time cmake --build . --target install -- -j${CPP_MAKE_PARALLELISM}

Review comment:
   Does `CMAKE_BUILD_PARALLEL_LEVEL` not work with Ubuntu 20.04?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox


wesm commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645023981


   I went ahead and asked https://github.com/ufal/unilib/issues/2



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-645021672


   > OK. Could you update the bundle and then re-run the build?
   
   Yes, will do
   
   > And could you add me ( https://bintray.com/kou ) to 
https://bintray.com/ursalabs 's members?
   > I may want to upload something to there in the future.
   
   Done.
   
   > Or should we create https://bintray.com/apache/arrow/boost and migrate 
https://bintray.com/ursalabs/arrow-boost/arrow-boost to 
https://bintray.com/apache/arrow/boost ?
   
   We could I guess, though I don't think I have sufficient permissions on the 
org to do that.  
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#discussion_r441146931



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -30,6 +31,21 @@ namespace internal {
 
 namespace {
 
+// Code units in the range [a-z] can only be an encoding of an ascii
+// character/codepoint, not the 2nd, 3rd or 4th code unit (byte) of an 
different
+// codepoint. This guaranteed by non-overlap design of the unicode standard. 
(see
+// section 2.5 of Unicode Standard Core Specification v13.0)
+
+uint8_t ascii_tolower(uint8_t utf8_code_unit) {

Review comment:
   I think you will want this to be `static inline` (not sure all compilers 
will inline it otherwise)

##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -159,11 +282,23 @@ void MakeUnaryStringBatchKernel(std::string name, 
ArrayKernelExec exec,
   DCHECK_OK(registry->AddFunction(std::move(func)));
 }
 
+template  typename Transformer>

Review comment:
   I think some compilers demand that "class" be used for the last 
"typename"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package

2020-06-16 Thread GitBox


kou commented on pull request #6729:
URL: https://github.com/apache/arrow/pull/6729#issuecomment-645016478


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-16 Thread GitBox


kou commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-645015432


   OK. Could you update the bundle and then re-run the build?
   
   And could you add me ( https://bintray.com/kou ) to 
https://bintray.com/ursalabs 's members?
   I may want to upload something to there in the future.
   Or should we create https://bintray.com/apache/arrow/boost and migrate 
https://bintray.com/ursalabs/arrow-boost/arrow-boost to 
https://bintray.com/apache/arrow/boost ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


wesm commented on pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#issuecomment-645011421


   @bkietz if you want to disable the ascii_* tests that are failing please go 
ahead



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r441139760



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,107 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+
+using internal::CopyBitmap;
+using internal::InvertBitmap;
+
+namespace compute {
+namespace {
+
+struct IsValidOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+DCHECK_EQ(out->offset, 0);
+DCHECK_LE(out->length, arr.length);
+if (arr.buffers[0] != nullptr) {
+  out->buffers[1] = arr.offset == 0
+? arr.buffers[0]
+: SliceBuffer(arr.buffers[0], arr.offset / 8, 
arr.length / 8);
+  out->offset = arr.offset % 8;

Review comment:
   I don't know off the top of my head what will be the implications of 
modifying the offset of the output value, but I think it should be okay, and we 
can always fix it later if it becomes an issue





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox


wesm edited a comment on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-645004792


   I'll have to deal with the string optimization in a follow up PR, so I'm 
going to leave this for review as is. It would be good to get this merged 
sooner rather than later.
   
   EDIT: opened https://issues.apache.org/jira/browse/ARROW-9152



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox


wesm commented on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-645004792


   I'll have to deal with the string optimization in a follow up PR, so I'm 
going to leave this for review as is. It would be good to get this merged 
sooner rather than later



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-16 Thread GitBox


nealrichardson closed pull request #7441:
URL: https://github.com/apache/arrow/pull/7441


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7457: ARROW-9151: [R][CI] Fix Rtools 4.0 build: pacman sync

2020-06-16 Thread GitBox


nealrichardson closed pull request #7457:
URL: https://github.com/apache/arrow/pull/7457


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


bkietz commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644973548


   Okay, I'll start trimming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r441096806



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+
+using internal::CopyBitmap;
+using internal::InvertBitmap;
+
+namespace compute {
+namespace {
+
+struct IsValidOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {

Review comment:
   PTAL





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7457: ARROW-9151: [R][CI] Fix Rtools 4.0 build: pacman sync

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7457:
URL: https://github.com/apache/arrow/pull/7457#issuecomment-644969431


   https://issues.apache.org/jira/browse/ARROW-9151



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


fsaintjacques commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644969405


   Yes, we can already create FileFragment from any 
[FileSource](https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_base.h#L153-L157).
 You make a valid point that, usually, this is meant for a single buffer.
   
   If this is only scoped to FileSource and the python bindings, not touching 
any Factory, then this is fine as is.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7453: ARROW-9141: [R] Update cross-package documentation links

2020-06-16 Thread GitBox


nealrichardson closed pull request #7453:
URL: https://github.com/apache/arrow/pull/7453


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966741


   Note that it is not *only* for testing. We for sure use it for testing in 
pyarrow, but in pandas 1.0.4, we accidentally broke reading parquet files from 
file-like objects, and we directly got a some bug reports about it. So actual 
users also do that, to a certain extent.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


jorisvandenbossche edited a comment on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966084


   Taking a step back: wouldn't it be possible to eg "just" allow to create a 
Fragment from a buffer instead from a file?
   
   In practice, I think we only need to support dealing with buffers when there 
is a *single* buffer (so not like paths, where you can have multiple paths or a 
directory etc). And then do we need discovery at all? If we can construct a 
Fragment backed by a buffer instead of a file path, then you can create a 
Dataset from that, either with the physical schema of the fragment (no 
unification is needed if there is only one) or either with a user-specified 
schema. 
   And in such a case, the factory can focus on file paths only.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7451: ARROW-8769: [C++][R] Add convenience accessor for StructScalar fields

2020-06-16 Thread GitBox


nealrichardson closed pull request #7451:
URL: https://github.com/apache/arrow/pull/7451


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7451: ARROW-8769: [C++][R] Add convenience accessor for StructScalar fields

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7451:
URL: https://github.com/apache/arrow/pull/7451#issuecomment-644966570


   Rtools build failure is ARROW-9151. I'll merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7457: ARROW-9151: [R][CI] Fix Rtools 4.0 build: pacman sync

2020-06-16 Thread GitBox


nealrichardson opened a new pull request #7457:
URL: https://github.com/apache/arrow/pull/7457


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966084


   Taking a step back: wouldn't it be possible to eg "just" allow to create a 
Fragment from a buffer instead from a file?
   
   In practice, I think we only need to support dealing with buffers when there 
is a *single* buffer (so not like paths, where you can have multiple paths or a 
directory etc). And then do we need discovery at all? If we can construct a 
Fragment backed by a buffer instead of a file path, then you can create a 
Dataset from that, either with the physical schema of the fragment (no 
unification is needed if there is only one) or either with a user-specified 
schema.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on a change in pull request #7451: ARROW-8769: [C++][R] Add convenience accessor for StructScalar fields

2020-06-16 Thread GitBox


fsaintjacques commented on a change in pull request #7451:
URL: https://github.com/apache/arrow/pull/7451#discussion_r441071413



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -433,4 +433,24 @@ TYPED_TEST(TestNumericScalar, Cast) {
   }
 }
 
+TEST(TestStructScalar, FieldAccess) {
+  StructScalar abc({MakeScalar(true), MakeNullScalar(int32()), 
MakeScalar("hello")},
+   struct_({
+   field("a", boolean()),
+   field("b", int32()),
+   field("b", utf8()),
+   }));
+
+  ASSERT_OK_AND_ASSIGN(auto a, abc.field("a"));
+  AssertScalarsEqual(*a, *abc.value[0]);
+
+  ASSERT_RAISES(Invalid, abc.field("b").status());
+
+  ASSERT_OK_AND_ASSIGN(auto b, abc.field(1));
+  AssertScalarsEqual(*b, *abc.value[1]);
+
+  ASSERT_RAISES(Invalid, abc.field(5).status());
+  ASSERT_RAISES(Invalid, abc.field("c").status());

Review comment:
   Add a nested failure (not implemented).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-16 Thread GitBox


nealrichardson commented on a change in pull request #7441:
URL: https://github.com/apache/arrow/pull/7441#discussion_r441079053



##
File path: r/src/array_to_vector.cpp
##
@@ -418,6 +418,7 @@ class Converter_Struct : public Converter {
   std::vector> converters;
 };
 
+// Shouldn't this cast before dividing? Otherwise we're doing integer division

Review comment:
   I made the change and tests pass, so let's go with it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on a change in pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-16 Thread GitBox


fsaintjacques commented on a change in pull request #7441:
URL: https://github.com/apache/arrow/pull/7441#discussion_r440846641



##
File path: r/vignettes/arrow.Rmd
##
@@ -86,7 +88,73 @@ to other applications and services that use Arrow. One 
example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
-# Class structure and package conventions
+# Internals
+
+## Mapping of R <--> Arrow types
+
+Arrow has a rich data type system that includes direct parallels with R's data 
types and much more.
+
+In the tables, entries with a `-` are not currently implemented.
+
+### R to Arrow
+
+| R type   | Arrow type |
+|--||
+| logical  | boolean|
+| integer  | int32  |
+| double ("numeric")   | float64|
+| character| utf8   |
+| factor   | dictionary |
+| raw  | uint8  |
+| Date | date32 |
+| POSIXct  | timestamp  |
+| POSIXlt  | -  |
+| data.frame   | struct |
+| list^+^  | list   |
+| bit64::integer64 | int64  |
+| difftime | time32 |
+| vctrs::vctrs_unspecified | null   |
+
+^+^: Only lists where all elements are the same type are able to be translated 
to Arrow list type (which is a "list of" some type).
+
+### Arrow to R
+
+| Arrow type| R type   |
+|---|--|
+| boolean   | logical  |
+| int8  | integer  |
+| int16 | integer  |
+| int32 | integer  |
+| int64 | bit64::integer64 |
+| uint8 | integer  |
+| uint16| integer  |
+| uint32| double   |

Review comment:
   I'm curious about the double, is it a copy-paste error, or does it 
really cast to double.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-16 Thread GitBox


nealrichardson commented on a change in pull request #7441:
URL: https://github.com/apache/arrow/pull/7441#discussion_r441074458



##
File path: r/vignettes/arrow.Rmd
##
@@ -86,7 +88,73 @@ to other applications and services that use Arrow. One 
example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
-# Class structure and package conventions
+# Internals
+
+## Mapping of R <--> Arrow types
+
+Arrow has a rich data type system that includes direct parallels with R's data 
types and much more.
+
+In the tables, entries with a `-` are not currently implemented.
+
+### R to Arrow
+
+| R type   | Arrow type |
+|--||
+| logical  | boolean|
+| integer  | int32  |
+| double ("numeric")   | float64|
+| character| utf8   |
+| factor   | dictionary |
+| raw  | uint8  |
+| Date | date32 |
+| POSIXct  | timestamp  |
+| POSIXlt  | -  |
+| data.frame   | struct |
+| list^+^  | list   |
+| bit64::integer64 | int64  |
+| difftime | time32 |
+| vctrs::vctrs_unspecified | null   |
+
+^+^: Only lists where all elements are the same type are able to be translated 
to Arrow list type (which is a "list of" some type).
+
+### Arrow to R
+
+| Arrow type| R type   |
+|---|--|
+| boolean   | logical  |
+| int8  | integer  |
+| int16 | integer  |
+| int32 | integer  |
+| int64 | bit64::integer64 |
+| uint8 | integer  |
+| uint16| integer  |
+| uint32| double   |

Review comment:
   Expanded https://jira.apache.org/jira/browse/ARROW-9083 to cover 
uint{32,64}. I'm going to proceed with this PR as is and will happily revise 
the docs if we change the implementation. 
   
   Thanks for the discussion--shining a light on things like this is a big 
reason I picked up the documentation task.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-16 Thread GitBox


maartenbreddels commented on pull request #7452:
URL: https://github.com/apache/arrow/pull/7452#issuecomment-644951837


   @xhochy do you want me to rebase https://github.com/apache/arrow/pull/7449 
on this? So we can see if it's all working?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox


maartenbreddels commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-644949978


   It's not *that* slow, it was 40% of Vaex' performance (single threaded), so 
I think there is a bit more to be gained still. But I have added an 
optimization that tries ASCII conversion first. This gives it a 7x (compared to 
Vaex) to 10x speedup (in the benchmarks).
   
   Before:
   ```
   Utf8Lower   193873803 ns193823124 ns3 
bytes_per_second=102.387M/s items_per_second=5.40996M/s
   Utf8Upper   197154929 ns197093083 ns4 
bytes_per_second=100.688M/s items_per_second=5.32021M/s
   ```
   
   After:
   ```
   Utf8Lower19508443 ns 19493652 ns   36 
bytes_per_second=1018.02M/s items_per_second=53.7906M/s
   Utf8Upper19846885 ns 19832066 ns   35 
bytes_per_second=1000.65M/s items_per_second=52.8728M/s
   ```
   
   There is one loose end, the growth of the string can cause a utf8 array to 
be promoted to a large_utf8. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7453: ARROW-9141: [R] Update cross-package documentation links

2020-06-16 Thread GitBox


github-actions[bot] commented on pull request #7453:
URL: https://github.com/apache/arrow/pull/7453#issuecomment-644946345


   Revision: bdaec94d2b1c88216a4b9e0395c98438b5cf5c5e
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-325](https://github.com/ursa-labs/crossbow/branches/all?query=actions-325)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-325-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-325-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7453: ARROW-9141: [R] Update cross-package documentation links

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7453:
URL: https://github.com/apache/arrow/pull/7453#issuecomment-644945052


   @github-actions crossbow submit *as-cran*



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r441019388



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+
+using internal::CopyBitmap;
+using internal::InvertBitmap;
+
+namespace compute {
+namespace {
+
+struct IsValidOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {

Review comment:
   As it happens this fails for sliced input. I'll restore the checks. For 
your interest, one approach i tried for `is_valid` was a `MetaFunction` which 
invoked a no-op ScalarFunction with INTERSECTION null handling then yanked the 
null bitmap from that. Unfortunately INTERSECTION doesn't currently support the 
zero copy case and I wasn't sure the approach would be acceptable but it did 
avoid repetition of null bitmap handling logic. What do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques edited a comment on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


fsaintjacques edited a comment on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644939437


   I feel 0.5 on this PR in general, the functionality it adds is initially for 
testing and it introduces debt. I'm not keen on the change on FileSystemFactory 
since this class is used for 3 purposes, and only one of them is used by Native 
handles:
   
   - [x] Unify schema
   - [ ] Crawl a file systems to explore (requires a filesystem instance)
   - [ ] Discover partition information (requires a path)
   
   This patch retrofits FileSystemFactory for no other purpose than making 
buffer sources accessible in python while ignoring the actual discovery. In 
reality, this class deals heavily and almost exclusively with paths.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7455: ARROW-9031: [R] Implement conversion from Type::UINT64 to R vector

2020-06-16 Thread GitBox


nealrichardson commented on pull request #7455:
URL: https://github.com/apache/arrow/pull/7455#issuecomment-644940220


   > Why does `int64` use `Converter_Int64` but the other types are handled 
differently?
   
   Historical reasons I suppose. See also this discussion: 
https://github.com/apache/arrow/pull/7441#discussion_r440846641



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


fsaintjacques commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644939437


   I feel 0.5 on this PR in general, the functionality it adds is initially for 
testing and it introduces debt. I'm not keen on the change on FileSystemFactory 
since this class is used for 3 purposes, and only of of them is used by Native 
handles:
   
   - [x] Unify schema
   - [ ] Crawl a file systems to explore (requires a filesystem instance)
   - [ ] Discover partition information (requires a path)
   
   This patch retrofits FileSystemFactory for no other purpose than making 
buffer sources accessible in python while ignoring the actual discovery. In 
reality, this class deals heavily and almost exclusively with paths.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >