[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-06-07 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-640250180


   Closing the PR again. Please open JIRA issues or review other PRs to provide 
more feedback. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-30 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-636402690


   I'll leave this PR open until next week sometime to allow more time for 
comments



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-24 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-633240348


   The PR has been merged, but the code review feature is still available. 
Please continue to leave comments and I will address them



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-24 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-633231974


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-24 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-633229487


   @jorisvandenbossche I fixed the segfault you hit and now have changed the 
implementation of `Array.filter` to be
   
   ```
   pc = _pc()
   options = pc.FilterOptions(null_selection_behavior)
   return pc.call_function('filter', [self, mask], options)
   ```
   
   I also rebased -- I am going to merge this when the build is passing so we 
do not continue to stack changes that need to be reviewed on this PR. I welcome 
further code reviews on this PR or follow up JIRAs



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-24 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-633218315


   @jorisvandenbossche I'll take a look at that segfault
   
   > The goal is to eventually have them all go through _pc.call_function()?
   
   Yes. `arrow::compute::CallFunction` is general enough to deal with non-value 
(aka not Array or Scalar) arguments so we should be able to define table-valued 
functions that dispatch array kernels to a table or record batch's columns
   
   > pc.call_function() doesn't yet have a way to pass options. That's also 
something to be added in the future?
   
   Yes, I'll open a JIRA about this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-23 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-633154141


   I'd like to work on a couple follow-up PRs tomorrow, so in order to unblock 
these PRs so that they can be reviewed, this PR will have to be merged first. 
So I would like to do that tomorrow absent problems that need to be fixed. 
   
   Upon merging this PR, I would immediately reopen it so that others can 
continue to use the GitHub code review feature to leave comments in mass, and I 
will certainly respond to those comments and address them or open follow up 
JIRA issues so they are tracked. Please also see the "build out" umbrella JIRA 
where I will track the follow on work building out kernels functionality in the 
project
   
   https://issues.apache.org/jira/browse/ARROW-8894



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-23 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-633051554


   The build is passing (the Travis thing is a flake on ARM architecture), 
thanks @kou for your help with that! I think we should merge this today and I 
will continue to address comments in this PR in follow up patches. I'll wait 
until later today but it would be good if others could chime in to make sure 
this plan is agreeable. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632975883


   I see the problem, I am fixing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632975274


   Ok I will take a look



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632947370


   Note that `kHashSeed` is declared as `uint64_t` so that must be changed also
   
   ```
   static constexpr uint64_t kHashSeed = 0;
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632947188


   @kou that's fine, please go ahead and apply it



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632945792


   Perfect thank you :pray:. I'm done hacking on this for now, I'll keep 
investigating test failures 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632945052


   I'm just fixing a failing unit test now. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632944653


   @xhochy @pitrou @kszucs @jorisvandenbossche  I just added Python bindings 
for this new functionality including generic argument packing and function 
dispatching. The new implementation of `pyarrow.compute.sum` for example is now
   
   ```
   def sum(array):
   """
   Sum the values in a numerical (chunked) array.
   
   Parameters
   --
   array : pyarrow.Array or pyarrow.ChunkedArray
   
   Returns
   ---
   sum : pyarrow.Scalar
   """
   return call_function('sum', [array])
   ```
   
   So we're going to be able to go around the Python codebase and delete a 
_ton_ of code. 
   
   I haven't written unit tests for this part (let's do this later), but I 
added wrappers for the registry, functions, and kernels:
   
   ```
   In [1]: import pyarrow.compute as pc 
 
   
   In [2]: reg = pc.function_registry() 
 
   
   In [3]: reg.list_functions() 
 
   Out[3]: 
   ['add',
'and',
'and_kleene',
'count',
'dict_encode',
'equals',
'filter',
'greater',
'greater_equal',
'invert',
'isin',
'less',
'less_equal',
'match',
'mean',
'minmax',
'not_equals',
'or',
'or_kleene',
'partition_indices',
'sort_indices',
'sum',
'take',
'unique',
'value_counts',
'xor']
   
   In [4]: reg.get_function('equals')   
 
   Out[4]: 
   arrow.compute.Function
   kind: scalar
   num_kernels: 21
   
   In [5]: reg.get_function('equals').list_kernels()
 
   Out[5]: 
   [ScalarKernel<(any[bool], any[bool]) -> bool>,
ScalarKernel<(any[uint8], any[uint8]) -> bool>,
ScalarKernel<(any[uint16], any[uint16]) -> bool>,
ScalarKernel<(any[uint32], any[uint32]) -> bool>,
ScalarKernel<(any[uint64], any[uint64]) -> bool>,
ScalarKernel<(any[int8], any[int8]) -> bool>,
ScalarKernel<(any[int16], any[int16]) -> bool>,
ScalarKernel<(any[int32], any[int32]) -> bool>,
ScalarKernel<(any[int64], any[int64]) -> bool>,
ScalarKernel<(any[float], any[float]) -> bool>,
ScalarKernel<(any[double], any[double]) -> bool>,
ScalarKernel<(any[binary], any[binary]) -> bool>,
ScalarKernel<(any[string], any[string]) -> bool>,
ScalarKernel<(any[large_binary], any[large_binary]) -> bool>,
ScalarKernel<(any[large_string], any[large_string]) -> bool>,
ScalarKernel<(any[date32[day]], any[date32[day]]) -> bool>,
ScalarKernel<(any[date64[ms]], any[date64[ms]]) -> bool>,
ScalarKernel<(any[timestamp(s)], any[timestamp(s)]) -> bool>,
ScalarKernel<(any[timestamp(ms)], any[timestamp(ms)]) -> bool>,
ScalarKernel<(any[timestamp(us)], any[timestamp(us)]) -> bool>,
ScalarKernel<(any[timestamp(ns)], any[timestamp(ns)]) -> bool>]
   ```
   
   Needless to say very excited about all of this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632937311


   Done. It looks like GitHub is having some problems today but will keep an 
eye on the builds



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632936766


   OK, I have it fixed locally. I'm finishing some Python stuff and then I'll 
push



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632936071


   Thank you, I will fix. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632929465


   @kou done, thank you for catching this, I think it's better to retain this 
functionality. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632927540


   The IsIn/Match changes don't look too hard, should be able to push them in 
the next half hour or less



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632925674


   I'm taking a look now, I'll report back in a half hour or so



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632925336


   @kou yes, that was an API change. I can try to restore this functionality 
though?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632919447


   Awesome, thanks @kou!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632918704


   The R test suite is fixed -- this also turned up some missing test coverage 
for things that R was depending on 
https://issues.apache.org/jira/browse/ARROW-8895.
   
   I'm going to grind through the pending comments and add the simple Python 
bindings I talked about above. I think beyond that (and fixing remaining CI 
issues) the GLib bindings are the main things left. @kou please let me know if 
I can be of assistance



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632910238


   This problem of kernel dispatch with timestamps that may or may not have 
time zones actually brought out a limitation with input type checking. I'm 
introducing a simple interface for user-defined type-checking rules. I think 
this should give us sufficient flexibility for the time being, and if not we 
can always improve the interfaces. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-22 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632824231


   I just rebased and pushed MSVC build fixes (that works for me at least 
locally on VS 2017). I'm going to fix the R failure next and then address the 
other accumulated comments above.
   
   As far as timeline for this, I would ideally like to have this merged on 
Sunday at the latest. I was planning to take next week mostly off of work so 
this would unblock PRs into arrow/compute
   
   I also would like in the next couple days to write a small follow up patch 
or two to provide a template for scalar string kernels to help unblock 
ARROW-555. I will also start creating an umbrella JIRA issue for a more general 
kernels library buildout (a plan for going from our ~20 kernels to 100+)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-21 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632460179


   I got sucked into some other things today, I'll focus on getting the CI 
passing tomorrow so we can hopefully go into the weekend with a green build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-21 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632452421


   @emkornfield @brills no Python APIs are changed (not sure how much of the 
C++ API TFX uses)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-21 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632430558


   I rebased without pushing any new changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-21 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-632167817


   @nealrichardson yeah, I will fix



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework

2020-05-20 Thread GitBox


wesm commented on pull request #7240:
URL: https://github.com/apache/arrow/pull/7240#issuecomment-631878670


   For code reviewers:
   
   * The code diff is a bit deceiving because many kernel files were renamed to 
make their taxonomy more clear
   * I suggest focusing your energies on the files in the arrow/compute top 
level directory. This is almost entirely new code, what was there before is 
almost all gone
   
   I am aware that there are some obviously messy things that will need to be 
cleaned up a bit, but we have to start somewhere. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org