[GitHub] [arrow] rj-atw commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


rj-atw commented on pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#issuecomment-658516845


   > 
   > 
   > @rj-atw I would like to see us get wasm support but I think we need to add 
CI for this. Without CI all those `cfg` attributes are destined to go stale...
   
   @paddyhoran I love to add the need CI. Do we have any documentation around 
this projects CI (especially how to setup env locally)?  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rj-atw commented on a change in pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


rj-atw commented on a change in pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#discussion_r454760959



##
File path: rust/arrow/src/util/bit_util.rs
##
@@ -366,13 +371,13 @@ mod tests {
 assert_eq!(ceil(8, 8), 1);
 assert_eq!(ceil(9, 8), 2);
 assert_eq!(ceil(9, 9), 1);
-assert_eq!(ceil(100, 10), 10);
-assert_eq!(ceil(10, 100), 1);
-assert_eq!(ceil(100, 10), 10);
+assert_eq!(ceil(1000, 10), 100);

Review comment:
   I will double check these changes. I made them in response to a test 
that was failing; however, I found it suspect at the time as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


paddyhoran commented on pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#issuecomment-658515334


   Also, I assigned the JIRA to you and changed your permissions so you can 
self-assign in future



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


paddyhoran commented on pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#issuecomment-658514635


   @rj-atw I would like to see us get wasm support but I think we need to add 
CI for this.  Without CI all those `cfg` attributes are destined to go stale...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on a change in pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


paddyhoran commented on a change in pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#discussion_r454755766



##
File path: rust/arrow/src/util/bit_util.rs
##
@@ -366,13 +371,13 @@ mod tests {
 assert_eq!(ceil(8, 8), 1);
 assert_eq!(ceil(9, 8), 2);
 assert_eq!(ceil(9, 9), 1);
-assert_eq!(ceil(100, 10), 10);
-assert_eq!(ceil(10, 100), 1);
-assert_eq!(ceil(100, 10), 10);
+assert_eq!(ceil(1000, 10), 100);

Review comment:
   Why the changes here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm closed pull request #7757:
URL: https://github.com/apache/arrow/pull/7757


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658507782


   > I think the rationale is that the memory and performance savings related 
to materializing the partition columns are mos significant with string data. So 
it's definitely beneficial to return them as dictionary types.
   
   Right, my understanding from Joris's last comment was that this was already 
converting strings to dictionaries, which seems like a reasonable (though not 
mandatory) choice, and that the hangup was whether it was essential to also do 
that for ints.
   
   I guess the other workaround if people aren't happy with the choice here is 
to set `use_legacy_dataset = True`, so I agree that it's not the end of the 
world if the choice we make about dictionaries today turns out not to be 
optimal. But we should merge this so that the default is to use the datasets 
API so that we can learn where exactly we were mistaken.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#issuecomment-658481405


   https://issues.apache.org/jira/browse/ARROW-9453



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rj-atw opened a new pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-07-14 Thread GitBox


rj-atw opened a new pull request #7767:
URL: https://github.com/apache/arrow/pull/7767


   Adding compilation flag to allow rust library to compile against Wasm32 
target



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


wesm edited a comment on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856


   I think the rationale is that the memory and performance savings related to 
materializing the partition columns are mos significant with string data. So 
it's definitely beneficial to return them as dictionary types.
   
   IMHO if there is a change from dictionary/dense required post-1.0.0 it is 
not the end of the world, so I'm OK either with merging this as is or changing 
all partition types to be dictionary. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658468591


   OK, writing is disabled but old files can still be read
   
   ```
   n [2]: pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4')   

 
   ---
   OSError   Traceback (most recent call last)
in 
   > 1 pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4')
   
   ~/code/arrow/python/pyarrow/parquet.py in write_table(table, where, 
row_group_size, version, use_dictionary, compression, write_statistics, 
use_deprecated_int96_timestamps, coerce_timestamps, allow_truncated_timestamps, 
data_page_size, flavor, filesystem, compression_level, use_byte_stream_split, 
data_page_version, **kwargs)
  1632 data_page_version=data_page_version,
  1633 **kwargs) as writer:
   -> 1634 writer.write_table(table, row_group_size=row_group_size)
  1635 except Exception:
  1636 if _is_path_like(where):
   
   ~/code/arrow/python/pyarrow/parquet.py in write_table(self, table, 
row_group_size)
   586 raise ValueError(msg)
   587 
   --> 588 self.writer.write_table(table, row_group_size=row_group_size)
   589 
   590 def close(self):
   
   ~/code/arrow/python/pyarrow/_parquet.pyx in 
pyarrow._parquet.ParquetWriter.write_table()
  1406 
  1407 with nogil:
   -> 1408 check_status(self.writer.get()
  1409  .WriteTable(deref(ctable), 
c_row_group_size))
  1410 
   
   ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
97 raise IOError(errno, message)
98 else:
   ---> 99 raise IOError(message)
   100 elif status.IsOutOfMemory():
   101 raise ArrowMemoryError(message)
   
   OSError: Per ARROW-9424, writing files with LZ4 compression has been 
disabled until implementation issues have been resolved. It is recommended to 
read any existing files and rewrite them using a different compression.
   In ../src/parquet/arrow/writer.cc, line 684, code: 
WriteColumnChunk(table.column(i), offset, size)
   
   In [3]: pq.read_table('example.parquet.lz4').to_pandas() 

  
   Out[3]: 
  f0
   0   1
   1   2
   2   3
   3   4
   4   5
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


wesm commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856


   I think the rationale is that the memory and performance savings related to 
materializing the partition columns are mostly related to string data. So it's 
definitely beneficial to return them as dictionary types.
   
   IMHO if there is a change from dictionary/dense required post-1.0.0 it is 
not the end of the world, so I'm OK either with merging this as is or changing 
all partition types to be dictionary. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion

2020-07-14 Thread GitBox


wesm closed pull request #7765:
URL: https://github.com/apache/arrow/pull/7765


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion

2020-07-14 Thread GitBox


wesm commented on pull request #7765:
URL: https://github.com/apache/arrow/pull/7765#issuecomment-658464870


   +1. The Ursabot failures must be addressed separately by setting the 
environment variable on those external build configurations



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7743: ARROW-9452: [Rust] [DataFusion] Optimize ParquetScanExec

2020-07-14 Thread GitBox


wesm closed pull request #7743:
URL: https://github.com/apache/arrow/pull/7743


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7743: ARROW-9452: [Rust] [DataFusion] Optimize ParquetScanExec

2020-07-14 Thread GitBox


andygrove commented on pull request #7743:
URL: https://github.com/apache/arrow/pull/7743#issuecomment-658462539


   If nobody objects, I'd quite like this to make it in for the 1.0.0 release.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7766: ARROW-9473: [Doc] Polishing for 1.0

2020-07-14 Thread GitBox


nealrichardson closed pull request #7766:
URL: https://github.com/apache/arrow/pull/7766


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658440946


   Short version: I agree with Wes: do what you think best but let's merge 
something.
   
   Longer thoughts: 
   
   If I understand correctly, the difference is whether or not partition fields 
that are integers are encoded as dictionaries in the old function and they come 
out as ints in the new? Semantically, they should behave the same though right? 
   
   It is possible to cast an integer column to dictionary, or at least I recall 
seeing a dictionary encode function. So if someone felt strongly about it, they 
could make their partition fields be dictionaries after the fact.
   
   I caveat this by confessing that I've never personally used this function, 
but on face value I don't see why partition fields have to be dictionary type, 
particularly if we're only talking about integers. So if the main reason for 
adding something to force them to be dictionaries is to conform exactly with an 
old API, I'm not sure it's worth it. Maybe that's the wrong call, but since 
there is a workaround, why not try it without coercing everything to dictionary 
and see what reaction we get? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm edited a comment on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438428


   I don't recall but that may have been the case. Either way it's a giant mess 
since many people use pyarrow to write Parquet files to be consumed by 
JVM-based systems. I think we can infer that LZ4 is not often used from the 
fact that we haven't had more bug reports about it.
   
   Note that we can provide backward compatibility if needed for existing 
LZ4-compressed files by looking at the version number in the file footer



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


pitrou commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438215


   In any case, the format used by Hadoop is neither of both, it's LZ4_RAW with 
a custom header...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-14 Thread GitBox


nealrichardson closed pull request #7706:
URL: https://github.com/apache/arrow/pull/7706


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438428


   I don't recall but that may have been the case. Either way it's a giant mess 
since many people use pyarrow to write Parquet files to be consumed by 
JVM-based systems. I think we can infer that LZ4 is not often used from the 
fact that we haven't had more bug reports about it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


pitrou commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658437867


   Wasn't it deliberate? IIRC we didn't want to break compatibility with 
existing files.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658437573


   @pitrou @xhochy It seems that despite adding the LZ4_FRAME format we've been 
continuing to use LZ4_RAW for Parquet files. Unfortunate that this hasn't seen 
more compatibility testing. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7763: ARROW-9472: [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed

2020-07-14 Thread GitBox


nealrichardson closed pull request #7763:
URL: https://github.com/apache/arrow/pull/7763


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658431456


   No problem, I can take it from here. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] patrickpai commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


patrickpai commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658430846


   @wesm I think it's best if I get help on this. I'm totally new to the python 
codebase and wasn't expecting to finish today. I can let you take over.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7766: ARROW-9473: [Doc] Polishing for 1.0

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7766:
URL: https://github.com/apache/arrow/pull/7766#issuecomment-658430115


   https://issues.apache.org/jira/browse/ARROW-9473



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658429763


   @patrickpai do you anticipate to complete this today? We are hoping to cut a 
release candidate tomorrow during the workday Central Europe Time so I can help 
finish this if needed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion

2020-07-14 Thread GitBox


wesm edited a comment on pull request #7765:
URL: https://github.com/apache/arrow/pull/7765#issuecomment-658428851


   @kszucs `ARROW_TEST_DATA` is not being set in some of the Ursabot builders



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion

2020-07-14 Thread GitBox


wesm commented on pull request #7765:
URL: https://github.com/apache/arrow/pull/7765#issuecomment-658428851


   @kszucs `ARROW_TEST_DATA` is not being set in the Ursabot builders



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions

2020-07-14 Thread GitBox


wesm closed pull request #7764:
URL: https://github.com/apache/arrow/pull/7764


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-658428103


   Revision: 98340934b34cd4cc6e49a6f25518d51accd55f11
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-421](https://github.com/ursa-labs/crossbow/branches/all?query=actions-421)
   
   |Task|Status|
   ||--|
   |test-conda-r-4.0|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-421-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-421-github-test-conda-r-4.0)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-658427378


   @github-actions crossbow submit test-conda-r-4.0



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7766: ARROW-9473: [Doc] Polishing for 1.0

2020-07-14 Thread GitBox


nealrichardson opened a new pull request #7766:
URL: https://github.com/apache/arrow/pull/7766


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7765:
URL: https://github.com/apache/arrow/pull/7765#issuecomment-658424275


   https://issues.apache.org/jira/browse/ARROW-9399



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion

2020-07-14 Thread GitBox


wesm opened a new pull request #7765:
URL: https://github.com/apache/arrow/pull/7765


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions

2020-07-14 Thread GitBox


nealrichardson commented on a change in pull request #7764:
URL: https://github.com/apache/arrow/pull/7764#discussion_r454644099



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -944,33 +944,33 @@ void RegisterScalarStringAscii(FunctionRegistry* 
registry) {
   MakeUnaryStringBatchKernel("ascii_upper", registry);
   MakeUnaryStringBatchKernel("ascii_lower", registry);
 
-  AddUnaryStringPredicate("string_isascii", registry);
-
-  AddUnaryStringPredicate("ascii_isalnum", registry);
-  AddUnaryStringPredicate("ascii_isalpha", registry);
-  AddUnaryStringPredicate("ascii_isdecimal", registry);
-  // no isdigic for ascii, since it is the same as isdecimal
-  AddUnaryStringPredicate("ascii_islower", registry);
-  // no isnumeric for ascii, since it is the same as isdecimal
-  AddUnaryStringPredicate("ascii_isprintable", registry);
-  AddUnaryStringPredicate("ascii_isspace", registry);
-  AddUnaryStringPredicate("ascii_istitle", registry);
-  AddUnaryStringPredicate("ascii_isupper", registry);
+  AddUnaryStringPredicate("string_is_ascii", registry);
+
+  AddUnaryStringPredicate("ascii_is_alnum", registry);
+  AddUnaryStringPredicate("ascii_is_alpha", registry);
+  AddUnaryStringPredicate("ascii_is_decimal", registry);
+  // no is_digic for ascii, since it is the same as is_decimal

Review comment:
   is_digic?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7746: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes

2020-07-14 Thread GitBox


wesm closed pull request #7746:
URL: https://github.com/apache/arrow/pull/7746


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7764:
URL: https://github.com/apache/arrow/pull/7764#issuecomment-658412404


   https://issues.apache.org/jira/browse/ARROW-9390



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions

2020-07-14 Thread GitBox


wesm commented on pull request #7764:
URL: https://github.com/apache/arrow/pull/7764#issuecomment-658410462


   cc @nealrichardson 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-07-14 Thread GitBox


wesm commented on pull request #7272:
URL: https://github.com/apache/arrow/pull/7272#issuecomment-658410644


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-07-14 Thread GitBox


wesm closed pull request #7272:
URL: https://github.com/apache/arrow/pull/7272


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7746: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes

2020-07-14 Thread GitBox


BryanCutler commented on pull request #7746:
URL: https://github.com/apache/arrow/pull/7746#issuecomment-658411259


   Thanks @kszucs for fixing up the Docker file!
   
   > I should have been aware of the effects ARROW-9300 was goign to have 
downstream. Anything I can assist with @BryanCutler ?
   
   No worries @rymurr ! The java changes were pretty minimal and I think that's 
all working well now. 
   
   > AFAICT this can be merged? The timestamp related problems will require 
follow up work
   
   The current timezone failure is the result of 
https://github.com/apache/arrow/pull/7604 and was expected here. Since this 
improves a little to only 1 known failure, I think this is ok to be merged. I 
will open another PR for the remaining issue when I'm able to get to it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions

2020-07-14 Thread GitBox


wesm opened a new pull request #7764:
URL: https://github.com/apache/arrow/pull/7764


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7734: ARROW-8480: [Rust] Use NonNull well aligned pointer as Unique reference

2020-07-14 Thread GitBox


wesm closed pull request #7734:
URL: https://github.com/apache/arrow/pull/7734


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-658400582


   Revision: b1f53bb0a92a11eb280ff515438882b0decbf172
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-420](https://github.com/ursa-labs/crossbow/branches/all?query=actions-420)
   
   |Task|Status|
   ||--|
   |test-conda-r-4.0|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-420-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-420-github-test-conda-r-4.0)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


nealrichardson closed pull request #7741:
URL: https://github.com/apache/arrow/pull/7741


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7762: ARROW-8650: [Rust] [Website] Add documentation to Arrow website

2020-07-14 Thread GitBox


nealrichardson closed pull request #7762:
URL: https://github.com/apache/arrow/pull/7762


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


jorisvandenbossche commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658396423


   Note that the new option with the datasets API only dictionary encodes 
string partition fields, and not integer partition field. So it would still not 
keep exactly the same behaviour as we had before ..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-658396393


   @github-actions crossbow submit test-conda-r-4.0



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7746: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes

2020-07-14 Thread GitBox


wesm commented on pull request #7746:
URL: https://github.com/apache/arrow/pull/7746#issuecomment-658395334


   AFAICT this can be merged? The timestamp related problems will require 
follow up work



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7740: ARROW-9447 [Rust][DataFusion] Made ScalarUDF (Send + Sync)

2020-07-14 Thread GitBox


wesm closed pull request #7740:
URL: https://github.com/apache/arrow/pull/7740


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7756: ARROW-9458: [Python] Release GIL in ScanTask.execute

2020-07-14 Thread GitBox


wesm closed pull request #7756:
URL: https://github.com/apache/arrow/pull/7756


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7761: ARROW-9470: [CI][Java] Run Maven in parallel

2020-07-14 Thread GitBox


wesm closed pull request #7761:
URL: https://github.com/apache/arrow/pull/7761


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658393753


   Ah I see that you're adding Python changes.  I fixed the lint problems here 
so be sure to rebase your changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7741:
URL: https://github.com/apache/arrow/pull/7741#issuecomment-658393103


   Revision: 343e87d2022a4c43b2e894a2b3d24a61f29f477e
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-419](https://github.com/ursa-labs/crossbow/branches/all?query=actions-419)
   
   |Task|Status|
   ||--|
   
|test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-419-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-419-azure-test-r-rocker-r-base-latest)|
   
|test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-419-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-419-azure-test-r-rstudio-r-base-3.6-opensuse42)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7741:
URL: https://github.com/apache/arrow/pull/7741#issuecomment-658392235


   @github-actions crossbow submit test-r-rstudio-r-base-3.6-opensuse42 
test-r-rocker-r-base-latest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7741:
URL: https://github.com/apache/arrow/pull/7741#issuecomment-658392032


   The two crossbow failures (other than conda-r, which we're ignoring) appear 
to be flakes, but I'll re-run them to be extra sure. 
   
   Re-review appreciated on the latest change. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7748: ARROW-9388: [C++] Division kernels

2020-07-14 Thread GitBox


bkietz commented on a change in pull request #7748:
URL: https://github.com/apache/arrow/pull/7748#discussion_r454610972



##
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##
@@ -675,11 +704,80 @@ struct ScalarBinary {
   }
 };
 
+// An alternative to ScalarBinary that Applies a scalar operation on only the
+// not-null values of arrays.
+template 
+struct ScalarBinaryNotNull {
+  using OUT = typename GetOutputType::T;
+
+  static void ArrayArray(KernelContext* ctx, const ArrayData& arg0, const 
ArrayData& arg1,
+ Datum* out) {
+ArrayIterator arg0_it(arg0);
+ArrayIterator arg1_it(arg1);
+OutputAdapter::WriteOnlyValid(
+ctx, out, [&]() -> OUT { return Op::template Call(ctx, arg0_it(), 
arg1_it()); },
+[&]() {
+  ARROW_UNUSED(arg0_it());
+  ARROW_UNUSED(arg1_it());
+});
+  }
+
+  static void ArrayScalar(KernelContext* ctx, const ArrayData& arg0, const 
Scalar& arg1,
+  Datum* out) {
+ArrayIterator arg0_it(arg0);
+auto arg1_val = UnboxScalar::Unbox(arg1);
+OutputAdapter::WriteOnlyValid(
+ctx, out, [&]() -> OUT { return Op::template Call(ctx, arg0_it(), 
arg1_val); },
+[&]() { ARROW_UNUSED(arg0_it()); });
+  }
+
+  static void ScalarArray(KernelContext* ctx, const Scalar& arg0, const 
ArrayData& arg1,
+  Datum* out) {
+auto arg0_val = UnboxScalar::Unbox(arg0);
+ArrayIterator arg1_it(arg1);
+OutputAdapter::WriteOnlyValid(
+ctx, out, [&]() -> OUT { return Op::template Call(ctx, arg0_val, 
arg1_it()); },
+[&]() { ARROW_UNUSED(arg1_it()); });
+  }
+
+  static void ScalarScalar(KernelContext* ctx, const Scalar& arg0, const 
Scalar& arg1,
+   Datum* out) {
+if (out->scalar()->is_valid) {
+  auto arg0_val = UnboxScalar::Unbox(arg0);
+  auto arg1_val = UnboxScalar::Unbox(arg1);
+  BoxScalar::Box(Op::template Call(ctx, arg0_val, arg1_val),
+  out->scalar().get());
+}
+  }
+
+  static void Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+if (batch[0].kind() == Datum::ARRAY) {
+  if (batch[1].kind() == Datum::ARRAY) {
+return ArrayArray(ctx, *batch[0].array(), *batch[1].array(), out);
+  } else {
+return ArrayScalar(ctx, *batch[0].array(), *batch[1].scalar(), out);
+  }
+} else {
+  if (batch[1].kind() == Datum::ARRAY) {
+// e.g. if we were doing scalar < array, we flip and do array >= scalar

Review comment:
   I think this comment is not applicable
   ```suggestion
   ```

##
File path: cpp/src/arrow/compute/api_scalar.h
##
@@ -130,6 +130,19 @@ Result Multiply(const Datum& left, const Datum& 
right,
ArithmeticOptions options = ArithmeticOptions(),
ExecContext* ctx = NULLPTR);
 
+/// \brief Divide two values. Array values must be the same length. If either
+/// factor is null the result will be null.

Review comment:
   Please describe division-by-zero behavior

##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##
@@ -219,48 +219,77 @@ struct MultiplyChecked {
   }
 };
 
+struct Divide {
+  template 
+  static constexpr enable_if_floating_point Call(KernelContext*, T left, T 
right) {
+return left / right;
+  }
+
+  template 
+  static constexpr enable_if_integer Call(KernelContext*, T left, T right) {
+return left / right;
+  }
+};
+
+struct DivideChecked {
+  template 
+  static enable_if_integer Call(KernelContext* ctx, T left, T right) {
+if (right == 0) {
+  ctx->SetStatus(Status::Invalid("divide by 0"));
+  return right;
+}
+return Divide::Call(ctx, left, right);
+  }
+
+  template 
+  static constexpr enable_if_floating_point Call(KernelContext*, T left, T 
right) {
+return left * right;

Review comment:
   ```suggestion
   return left / right;
   ```

##
File path: cpp/src/arrow/compute/api_scalar.h
##
@@ -130,6 +130,19 @@ Result Multiply(const Datum& left, const Datum& 
right,
ArithmeticOptions options = ArithmeticOptions(),
ExecContext* ctx = NULLPTR);
 
+/// \brief Divide two values. Array values must be the same length. If either
+/// factor is null the result will be null.
+///
+/// \param[in] left the dividend
+/// \param[in] right the divisor
+/// \param[in] options arithmetic options (overflow handling), optional

Review comment:
   Division does not overflow unless the dividend is INT_MIN and the 
divisor is -1; IIUC `overflow_handling` is being used to indicate whether 
divide-by-zero should raise an error. If so please amend the comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the 

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-07-14 Thread GitBox


jorisvandenbossche commented on a change in pull request #7272:
URL: https://github.com/apache/arrow/pull/7272#discussion_r454607147



##
File path: cpp/src/arrow/table.cc
##
@@ -362,6 +362,23 @@ Result> Table::RenameColumns(
   return Table::Make(::arrow::schema(std::move(fields)), std::move(columns), 
num_rows());
 }
 
+Result> Table::SelectColumns(
+const std::vector& indices) const {
+  int n = static_cast(indices.size());
+
+  std::vector> columns(n);
+  std::vector> fields(n);
+  for (int i = 0; i < n; i++) {
+int pos = indices[i];

Review comment:
   Added a boundscheck





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7763: ARROW-9472: [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed

2020-07-14 Thread GitBox


nealrichardson opened a new pull request #7763:
URL: https://github.com/apache/arrow/pull/7763


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7763: ARROW-9472: [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7763:
URL: https://github.com/apache/arrow/pull/7763#issuecomment-658377740


   https://issues.apache.org/jira/browse/ARROW-9472



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7762: ARROW-8650: [Rust] [Website] Add documentation to Arrow website

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7762:
URL: https://github.com/apache/arrow/pull/7762#issuecomment-658349165


   https://issues.apache.org/jira/browse/ARROW-8650



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7762: ARROW-8650: [Rust] [Website] Add documentation to Arrow website

2020-07-14 Thread GitBox


nealrichardson opened a new pull request #7762:
URL: https://github.com/apache/arrow/pull/7762


   Also adds the links for the other missing languages, as are being added to 
the main website menu



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names

2020-07-14 Thread GitBox


pitrou commented on a change in pull request #7755:
URL: https://github.com/apache/arrow/pull/7755#discussion_r454553900



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) {
   MakeUnaryStringBatchKernel("ascii_upper", registry);
   MakeUnaryStringBatchKernel("ascii_lower", registry);
 
-  AddUnaryStringPredicate("binary_isascii", registry);
+  AddUnaryStringPredicate("string_isascii", registry);

Review comment:
   Also `ascii_is_alnum`, etc. ;-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names

2020-07-14 Thread GitBox


wesm commented on a change in pull request #7755:
URL: https://github.com/apache/arrow/pull/7755#discussion_r454553258



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) {
   MakeUnaryStringBatchKernel("ascii_upper", registry);
   MakeUnaryStringBatchKernel("ascii_lower", registry);
 
-  AddUnaryStringPredicate("binary_isascii", registry);
+  AddUnaryStringPredicate("string_isascii", registry);

Review comment:
   I can make this change today unless someone else wants to





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7741:
URL: https://github.com/apache/arrow/pull/7741#issuecomment-658334397


   Revision: 343e87d2022a4c43b2e894a2b3d24a61f29f477e
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-418](https://github.com/ursa-labs/crossbow/branches/all?query=actions-418)
   
   |Task|Status|
   ||--|
   
|homebrew-r-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-418-travis-homebrew-r-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |test-conda-r-4.0|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-418-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-418-github-test-conda-r-4.0)|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-418-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-418-github-test-r-linux-as-cran)|
   
|test-r-rhub-ubuntu-gcc-release|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rhub-ubuntu-gcc-release)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rhub-ubuntu-gcc-release)|
   
|test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rocker-r-base-latest)|
   
|test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-bionic)|
   
|test-r-rstudio-r-base-3.6-centos6|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-centos6)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-centos6)|
   
|test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-centos8)|
   
|test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse15)|
   
|test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse42)|
   
|test-ubuntu-18.04-r-sanitizer|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-ubuntu-18.04-r-sanitizer)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-ubuntu-18.04-r-sanitizer)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names

2020-07-14 Thread GitBox


pitrou commented on a change in pull request #7755:
URL: https://github.com/apache/arrow/pull/7755#discussion_r454550954



##
File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc
##
@@ -61,9 +61,9 @@ static void IsAlphaNumericAscii(benchmark::State& state) {
   UnaryStringBenchmark(state, "ascii_isalnum");
 }
 
-static void BinaryContainsExact(benchmark::State& state) {
-  BinaryContainsExactOptions options("abac");
-  UnaryStringBenchmark(state, "binary_contains_exact", );
+static void MatchSubstring(benchmark::State& state) {
+  MatchSubstringOptions options("abac");
+  UnaryStringBenchmark(state, "match_substring", );

Review comment:
   I don't think so. Personally, I find prefixes a bit pointless or even 
distracting.
   (or, here, plain misleading, since "binary_contains_exact" didn't work on 
binary arrays, only string arrays...)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names

2020-07-14 Thread GitBox


pitrou commented on a change in pull request #7755:
URL: https://github.com/apache/arrow/pull/7755#discussion_r454551146



##
File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc
##
@@ -61,9 +61,9 @@ static void IsAlphaNumericAscii(benchmark::State& state) {
   UnaryStringBenchmark(state, "ascii_isalnum");
 }
 
-static void BinaryContainsExact(benchmark::State& state) {
-  BinaryContainsExactOptions options("abac");
-  UnaryStringBenchmark(state, "binary_contains_exact", );
+static void MatchSubstring(benchmark::State& state) {
+  MatchSubstringOptions options("abac");
+  UnaryStringBenchmark(state, "match_substring", );

Review comment:
   Perhaps someone else wants to think about a convention?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7751: ARROW-9461: [Rust] Fixed error in reading Date32 and Date64.

2020-07-14 Thread GitBox


andygrove commented on pull request #7751:
URL: https://github.com/apache/arrow/pull/7751#issuecomment-658333275


   @jorgecarleitao Could you add a unit test?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names

2020-07-14 Thread GitBox


pitrou commented on a change in pull request #7755:
URL: https://github.com/apache/arrow/pull/7755#discussion_r454549799



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) {
   MakeUnaryStringBatchKernel("ascii_upper", registry);
   MakeUnaryStringBatchKernel("ascii_lower", registry);
 
-  AddUnaryStringPredicate("binary_isascii", registry);
+  AddUnaryStringPredicate("string_isascii", registry);

Review comment:
   That's a good point. Should we make a followup PR?
   (if it's me, it's only tomorrow)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7741:
URL: https://github.com/apache/arrow/pull/7741#issuecomment-658332593


   @github-actions crossbow submit -g r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou closed pull request #7760: ARROW-9390: [Doc] Add missing file

2020-07-14 Thread GitBox


pitrou closed pull request #7760:
URL: https://github.com/apache/arrow/pull/7760


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou closed pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

2020-07-14 Thread GitBox


pitrou closed pull request #7753:
URL: https://github.com/apache/arrow/pull/7753


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7761: ARROW-9470: [CI][Java] Run Maven in parallel

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7761:
URL: https://github.com/apache/arrow/pull/7761#issuecomment-658330967


   https://issues.apache.org/jira/browse/ARROW-9470



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7760: ARROW-9390: [Doc] Add missing file

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7760:
URL: https://github.com/apache/arrow/pull/7760#issuecomment-658330966


   https://issues.apache.org/jira/browse/ARROW-9390



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

2020-07-14 Thread GitBox


pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658329952


   JPype build is green, AppVeyor is 
[green](https://ci.appveyor.com/project/pitrou/arrow). Will merge :-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7761: ARROW-9470: [CI][Java] Run Maven in parallel

2020-07-14 Thread GitBox


pitrou opened a new pull request #7761:
URL: https://github.com/apache/arrow/pull/7761


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7760: ARROW-9390: [Doc] Add missing file

2020-07-14 Thread GitBox


pitrou opened a new pull request #7760:
URL: https://github.com/apache/arrow/pull/7760


   Followup from PR #7755: a new doc file wasn't checked in, and Sphinx only 
emits a warning.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658317095


   Revision: 4752024fc2fbcb5b8e058d9f9fa8d2864c5c74ba
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-417](https://github.com/ursa-labs/crossbow/branches/all?query=actions-417)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.8-jpype|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-417-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-417-github-test-conda-python-3.8-jpype)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658316233


   @github-actions crossbow submit test-conda-python-3.8-jpype



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so

2020-07-14 Thread GitBox


nealrichardson commented on pull request #7741:
URL: https://github.com/apache/arrow/pull/7741#issuecomment-658310822


   On one of the linux images I've been testing locally, as well as on my mac, 
the R makevars added in that commit are setting `CXXFLAGS="-g -O2"`. Given 
that, it probably makes sense not to pass those flags to the libarrow build. 
But I still think we should strip because I can't control the fact that R is 
including that when it compiles the R bindings--that is, I could edit the 
makevars on my own system, but the package installation can't override whatever 
makevars R has on other people's systems.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou closed pull request #7759: ARROW-7831: [Java] Fix build error from #6402

2020-07-14 Thread GitBox


pitrou closed pull request #7759:
URL: https://github.com/apache/arrow/pull/7759


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran edited a comment on pull request #7748: ARROW-9388: [C++] Division kernels

2020-07-14 Thread GitBox


paddyhoran edited a comment on pull request #7748:
URL: https://github.com/apache/arrow/pull/7748#issuecomment-658304924


   We have this 
[implemented](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/arithmetic.rs#L289)
 with 
[SIMD](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/arithmetic.rs#L162)
 (which was tricky) on the Rust side.  We currently return a `Result` for 
division by zero.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on pull request #7748: ARROW-9388: [C++] Division kernels

2020-07-14 Thread GitBox


paddyhoran commented on pull request #7748:
URL: https://github.com/apache/arrow/pull/7748#issuecomment-658304924


   We have this implemented with 
[SIMD](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/arithmetic.rs#L162)
 (which was tricky) on the Rust side.  We currently return a `Result` for 
division by zero.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou edited a comment on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


pitrou edited a comment on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658296432


   Rebased. There were some Java changes in-between, hopefully they won't 
interfere.
   
   Edit: of course they do...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7759: ARROW-7831: [Java] Fix build error from #6402

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7759:
URL: https://github.com/apache/arrow/pull/7759#issuecomment-658304829


   https://issues.apache.org/jira/browse/ARROW-7831



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-07-14 Thread GitBox


pitrou commented on pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#issuecomment-658300664


   Ok, @lidavidm did it in https://github.com/apache/arrow/pull/7759



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm opened a new pull request #7759: NOJIRA: [Java] Fix build error from #6402

2020-07-14 Thread GitBox


lidavidm opened a new pull request #7759:
URL: https://github.com/apache/arrow/pull/7759


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-07-14 Thread GitBox


pitrou commented on pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#issuecomment-658300014


   @rymurr Are you available to give this a quick fix?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou edited a comment on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-07-14 Thread GitBox


pitrou edited a comment on pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#issuecomment-658299827


   This broke the Java builds:
   https://github.com/apache/arrow/actions/runs/168952655
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-07-14 Thread GitBox


pitrou commented on pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#issuecomment-658299827


   This broke the builds:
   https://github.com/apache/arrow/actions/runs/168952655
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names

2020-07-14 Thread GitBox


nealrichardson commented on a change in pull request #7755:
URL: https://github.com/apache/arrow/pull/7755#discussion_r454506911



##
File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc
##
@@ -61,9 +61,9 @@ static void IsAlphaNumericAscii(benchmark::State& state) {
   UnaryStringBenchmark(state, "ascii_isalnum");
 }
 
-static void BinaryContainsExact(benchmark::State& state) {
-  BinaryContainsExactOptions options("abac");
-  UnaryStringBenchmark(state, "binary_contains_exact", );
+static void MatchSubstring(benchmark::State& state) {
+  MatchSubstringOptions options("abac");
+  UnaryStringBenchmark(state, "match_substring", );

Review comment:
   Are there clear criteria for when the function gets a `binary_` or 
`string_` prefix and when it doesn't?
   
   Related, if we haven't documented the naming conventions (haven't seen it 
yet but I'll keep reading), we should so that future developers/selves know 
what to call new functions.

##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) {
   MakeUnaryStringBatchKernel("ascii_upper", registry);
   MakeUnaryStringBatchKernel("ascii_lower", registry);
 
-  AddUnaryStringPredicate("binary_isascii", registry);
+  AddUnaryStringPredicate("string_isascii", registry);

Review comment:
   Why isn't this `string_is_ascii`? It seems we prefer `is_x` to `isx` 
(cf. `isin` -> `is_in` in this PR).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658296432


   Rebased. There were some Java changes in-between, hopefully they won't 
interfere.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454500127



##
File path: ci/scripts/python_test.sh
##
@@ -29,4 +29,4 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
 # Enable some checks inside Python itself
 export PYTHONDEVMODE=1
 
-pytest -r s --pyargs pyarrow
+pytest -r s --pyargs pyarrow.tests.test_jvm

Review comment:
   Yes :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658292321


   Revision: 28f6500515fee577dc17f969a819ef01830ffdcd
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-416](https://github.com/ursa-labs/crossbow/branches/all?query=actions-416)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.8-jpype|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-416-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-416-github-test-conda-python-3.8-jpype)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


rymurr commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454498500



##
File path: ci/scripts/python_test.sh
##
@@ -29,4 +29,4 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
 # Enable some checks inside Python itself
 export PYTHONDEVMODE=1
 
-pytest -r s --pyargs pyarrow
+pytest -r s --pyargs pyarrow.tests.test_jvm

Review comment:
   Did this sneak in on accident?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454494904



##
File path: python/pyarrow/jvm.py
##
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+"""
+An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+memory alive.
+"""
+
+def __init__(self, jvm_buf):
+self.jvm_buf = jvm_buf
+self.jvm_buf.retain()
+
+def __del__(self):
+self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
 """
 Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
 Parameters
 --
 
-arrowbuf: org.apache.arrow.memory.ArrowBuf
+jvm_buf: org.apache.arrow.memory.ArrowBuf
 Arrow Buffer representation on the JVM.
 
 Returns
 ---
 pyarrow.Buffer
 Python Buffer that references the JVM memory.
 """
-address = arrowbuf.memoryAddress()
-size = arrowbuf.capacity()
-return pa.foreign_buffer(address, size, base=arrowbuf)
+address = jvm_buf.memoryAddress()

Review comment:
   I'm trying to improve this, hold on.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

2020-07-14 Thread GitBox


wesm commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454493669



##
File path: python/pyarrow/jvm.py
##
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+"""
+An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+memory alive.
+"""
+
+def __init__(self, jvm_buf):
+self.jvm_buf = jvm_buf
+self.jvm_buf.retain()
+
+def __del__(self):
+self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
 """
 Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
 Parameters
 --
 
-arrowbuf: org.apache.arrow.memory.ArrowBuf
+jvm_buf: org.apache.arrow.memory.ArrowBuf
 Arrow Buffer representation on the JVM.
 
 Returns
 ---
 pyarrow.Buffer
 Python Buffer that references the JVM memory.
 """
-address = arrowbuf.memoryAddress()
-size = arrowbuf.capacity()
-return pa.foreign_buffer(address, size, base=arrowbuf)
+address = jvm_buf.memoryAddress()

Review comment:
   I'm mostly trying to understand where and when this buffer will be 
destroyed. In any case it's outside the scope of this PR so we can let someone 
more invested in this code deal with it later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >