[GitHub] [arrow] rj-atw commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
rj-atw commented on pull request #7767: URL: https://github.com/apache/arrow/pull/7767#issuecomment-658516845 > > > @rj-atw I would like to see us get wasm support but I think we need to add CI for this. Without CI all those `cfg` attributes are destined to go stale... @paddyhoran I love to add the need CI. Do we have any documentation around this projects CI (especially how to setup env locally)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rj-atw commented on a change in pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
rj-atw commented on a change in pull request #7767: URL: https://github.com/apache/arrow/pull/7767#discussion_r454760959 ## File path: rust/arrow/src/util/bit_util.rs ## @@ -366,13 +371,13 @@ mod tests { assert_eq!(ceil(8, 8), 1); assert_eq!(ceil(9, 8), 2); assert_eq!(ceil(9, 9), 1); -assert_eq!(ceil(100, 10), 10); -assert_eq!(ceil(10, 100), 1); -assert_eq!(ceil(100, 10), 10); +assert_eq!(ceil(1000, 10), 100); Review comment: I will double check these changes. I made them in response to a test that was failing; however, I found it suspect at the time as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
paddyhoran commented on pull request #7767: URL: https://github.com/apache/arrow/pull/7767#issuecomment-658515334 Also, I assigned the JIRA to you and changed your permissions so you can self-assign in future This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
paddyhoran commented on pull request #7767: URL: https://github.com/apache/arrow/pull/7767#issuecomment-658514635 @rj-atw I would like to see us get wasm support but I think we need to add CI for this. Without CI all those `cfg` attributes are destined to go stale... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran commented on a change in pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
paddyhoran commented on a change in pull request #7767: URL: https://github.com/apache/arrow/pull/7767#discussion_r454755766 ## File path: rust/arrow/src/util/bit_util.rs ## @@ -366,13 +371,13 @@ mod tests { assert_eq!(ceil(8, 8), 1); assert_eq!(ceil(9, 8), 2); assert_eq!(ceil(9, 9), 1); -assert_eq!(ceil(100, 10), 10); -assert_eq!(ceil(10, 100), 1); -assert_eq!(ceil(100, 10), 10); +assert_eq!(ceil(1000, 10), 100); Review comment: Why the changes here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm closed pull request #7757: URL: https://github.com/apache/arrow/pull/7757 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default
nealrichardson commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658507782 > I think the rationale is that the memory and performance savings related to materializing the partition columns are mos significant with string data. So it's definitely beneficial to return them as dictionary types. Right, my understanding from Joris's last comment was that this was already converting strings to dictionaries, which seems like a reasonable (though not mandatory) choice, and that the hangup was whether it was essential to also do that for ints. I guess the other workaround if people aren't happy with the choice here is to set `use_legacy_dataset = True`, so I agree that it's not the end of the world if the choice we make about dictionaries today turns out not to be optimal. But we should merge this so that the default is to use the datasets API so that we can learn where exactly we were mistaken. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
github-actions[bot] commented on pull request #7767: URL: https://github.com/apache/arrow/pull/7767#issuecomment-658481405 https://issues.apache.org/jira/browse/ARROW-9453 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rj-atw opened a new pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
rj-atw opened a new pull request #7767: URL: https://github.com/apache/arrow/pull/7767 Adding compilation flag to allow rust library to compile against Wasm32 target This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default
wesm edited a comment on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856 I think the rationale is that the memory and performance savings related to materializing the partition columns are mos significant with string data. So it's definitely beneficial to return them as dictionary types. IMHO if there is a change from dictionary/dense required post-1.0.0 it is not the end of the world, so I'm OK either with merging this as is or changing all partition types to be dictionary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658468591 OK, writing is disabled but old files can still be read ``` n [2]: pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4') --- OSError Traceback (most recent call last) in > 1 pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4') ~/code/arrow/python/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, write_statistics, use_deprecated_int96_timestamps, coerce_timestamps, allow_truncated_timestamps, data_page_size, flavor, filesystem, compression_level, use_byte_stream_split, data_page_version, **kwargs) 1632 data_page_version=data_page_version, 1633 **kwargs) as writer: -> 1634 writer.write_table(table, row_group_size=row_group_size) 1635 except Exception: 1636 if _is_path_like(where): ~/code/arrow/python/pyarrow/parquet.py in write_table(self, table, row_group_size) 586 raise ValueError(msg) 587 --> 588 self.writer.write_table(table, row_group_size=row_group_size) 589 590 def close(self): ~/code/arrow/python/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetWriter.write_table() 1406 1407 with nogil: -> 1408 check_status(self.writer.get() 1409 .WriteTable(deref(ctable), c_row_group_size)) 1410 ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() 97 raise IOError(errno, message) 98 else: ---> 99 raise IOError(message) 100 elif status.IsOutOfMemory(): 101 raise ArrowMemoryError(message) OSError: Per ARROW-9424, writing files with LZ4 compression has been disabled until implementation issues have been resolved. It is recommended to read any existing files and rewrite them using a different compression. In ../src/parquet/arrow/writer.cc, line 684, code: WriteColumnChunk(table.column(i), offset, size) In [3]: pq.read_table('example.parquet.lz4').to_pandas() Out[3]: f0 0 1 1 2 2 3 3 4 4 5 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default
wesm commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856 I think the rationale is that the memory and performance savings related to materializing the partition columns are mostly related to string data. So it's definitely beneficial to return them as dictionary types. IMHO if there is a change from dictionary/dense required post-1.0.0 it is not the end of the world, so I'm OK either with merging this as is or changing all partition types to be dictionary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
wesm closed pull request #7765: URL: https://github.com/apache/arrow/pull/7765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
wesm commented on pull request #7765: URL: https://github.com/apache/arrow/pull/7765#issuecomment-658464870 +1. The Ursabot failures must be addressed separately by setting the environment variable on those external build configurations This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7743: ARROW-9452: [Rust] [DataFusion] Optimize ParquetScanExec
wesm closed pull request #7743: URL: https://github.com/apache/arrow/pull/7743 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove commented on pull request #7743: ARROW-9452: [Rust] [DataFusion] Optimize ParquetScanExec
andygrove commented on pull request #7743: URL: https://github.com/apache/arrow/pull/7743#issuecomment-658462539 If nobody objects, I'd quite like this to make it in for the 1.0.0 release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7766: ARROW-9473: [Doc] Polishing for 1.0
nealrichardson closed pull request #7766: URL: https://github.com/apache/arrow/pull/7766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default
nealrichardson commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658440946 Short version: I agree with Wes: do what you think best but let's merge something. Longer thoughts: If I understand correctly, the difference is whether or not partition fields that are integers are encoded as dictionaries in the old function and they come out as ints in the new? Semantically, they should behave the same though right? It is possible to cast an integer column to dictionary, or at least I recall seeing a dictionary encode function. So if someone felt strongly about it, they could make their partition fields be dictionaries after the fact. I caveat this by confessing that I've never personally used this function, but on face value I don't see why partition fields have to be dictionary type, particularly if we're only talking about integers. So if the main reason for adding something to force them to be dictionaries is to conform exactly with an old API, I'm not sure it's worth it. Maybe that's the wrong call, but since there is a workaround, why not try it without coercing everything to dictionary and see what reaction we get? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm edited a comment on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438428 I don't recall but that may have been the case. Either way it's a giant mess since many people use pyarrow to write Parquet files to be consumed by JVM-based systems. I think we can infer that LZ4 is not often used from the fact that we haven't had more bug reports about it. Note that we can provide backward compatibility if needed for existing LZ4-compressed files by looking at the version number in the file footer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
pitrou commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438215 In any case, the format used by Hadoop is neither of both, it's LZ4_RAW with a custom header... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson closed pull request #7706: URL: https://github.com/apache/arrow/pull/7706 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438428 I don't recall but that may have been the case. Either way it's a giant mess since many people use pyarrow to write Parquet files to be consumed by JVM-based systems. I think we can infer that LZ4 is not often used from the fact that we haven't had more bug reports about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
pitrou commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658437867 Wasn't it deliberate? IIRC we didn't want to break compatibility with existing files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658437573 @pitrou @xhochy It seems that despite adding the LZ4_FRAME format we've been continuing to use LZ4_RAW for Parquet files. Unfortunate that this hasn't seen more compatibility testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7763: ARROW-9472: [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed
nealrichardson closed pull request #7763: URL: https://github.com/apache/arrow/pull/7763 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658431456 No problem, I can take it from here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] patrickpai commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
patrickpai commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658430846 @wesm I think it's best if I get help on this. I'm totally new to the python codebase and wasn't expecting to finish today. I can let you take over. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7766: ARROW-9473: [Doc] Polishing for 1.0
github-actions[bot] commented on pull request #7766: URL: https://github.com/apache/arrow/pull/7766#issuecomment-658430115 https://issues.apache.org/jira/browse/ARROW-9473 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658429763 @patrickpai do you anticipate to complete this today? We are hoping to cut a release candidate tomorrow during the workday Central Europe Time so I can help finish this if needed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
wesm edited a comment on pull request #7765: URL: https://github.com/apache/arrow/pull/7765#issuecomment-658428851 @kszucs `ARROW_TEST_DATA` is not being set in some of the Ursabot builders This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
wesm commented on pull request #7765: URL: https://github.com/apache/arrow/pull/7765#issuecomment-658428851 @kszucs `ARROW_TEST_DATA` is not being set in the Ursabot builders This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions
wesm closed pull request #7764: URL: https://github.com/apache/arrow/pull/7764 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
github-actions[bot] commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-658428103 Revision: 98340934b34cd4cc6e49a6f25518d51accd55f11 Submitted crossbow builds: [ursa-labs/crossbow @ actions-421](https://github.com/ursa-labs/crossbow/branches/all?query=actions-421) |Task|Status| ||--| |test-conda-r-4.0|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-421-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-421-github-test-conda-r-4.0)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-658427378 @github-actions crossbow submit test-conda-r-4.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7766: ARROW-9473: [Doc] Polishing for 1.0
nealrichardson opened a new pull request #7766: URL: https://github.com/apache/arrow/pull/7766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
github-actions[bot] commented on pull request #7765: URL: https://github.com/apache/arrow/pull/7765#issuecomment-658424275 https://issues.apache.org/jira/browse/ARROW-9399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7765: ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
wesm opened a new pull request #7765: URL: https://github.com/apache/arrow/pull/7765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions
nealrichardson commented on a change in pull request #7764: URL: https://github.com/apache/arrow/pull/7764#discussion_r454644099 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -944,33 +944,33 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) { MakeUnaryStringBatchKernel("ascii_upper", registry); MakeUnaryStringBatchKernel("ascii_lower", registry); - AddUnaryStringPredicate("string_isascii", registry); - - AddUnaryStringPredicate("ascii_isalnum", registry); - AddUnaryStringPredicate("ascii_isalpha", registry); - AddUnaryStringPredicate("ascii_isdecimal", registry); - // no isdigic for ascii, since it is the same as isdecimal - AddUnaryStringPredicate("ascii_islower", registry); - // no isnumeric for ascii, since it is the same as isdecimal - AddUnaryStringPredicate("ascii_isprintable", registry); - AddUnaryStringPredicate("ascii_isspace", registry); - AddUnaryStringPredicate("ascii_istitle", registry); - AddUnaryStringPredicate("ascii_isupper", registry); + AddUnaryStringPredicate("string_is_ascii", registry); + + AddUnaryStringPredicate("ascii_is_alnum", registry); + AddUnaryStringPredicate("ascii_is_alpha", registry); + AddUnaryStringPredicate("ascii_is_decimal", registry); + // no is_digic for ascii, since it is the same as is_decimal Review comment: is_digic? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7746: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes
wesm closed pull request #7746: URL: https://github.com/apache/arrow/pull/7746 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions
github-actions[bot] commented on pull request #7764: URL: https://github.com/apache/arrow/pull/7764#issuecomment-658412404 https://issues.apache.org/jira/browse/ARROW-9390 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions
wesm commented on pull request #7764: URL: https://github.com/apache/arrow/pull/7764#issuecomment-658410462 cc @nealrichardson This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns
wesm commented on pull request #7272: URL: https://github.com/apache/arrow/pull/7272#issuecomment-658410644 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns
wesm closed pull request #7272: URL: https://github.com/apache/arrow/pull/7272 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on pull request #7746: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes
BryanCutler commented on pull request #7746: URL: https://github.com/apache/arrow/pull/7746#issuecomment-658411259 Thanks @kszucs for fixing up the Docker file! > I should have been aware of the effects ARROW-9300 was goign to have downstream. Anything I can assist with @BryanCutler ? No worries @rymurr ! The java changes were pretty minimal and I think that's all working well now. > AFAICT this can be merged? The timestamp related problems will require follow up work The current timezone failure is the result of https://github.com/apache/arrow/pull/7604 and was expected here. Since this improves a little to only 1 known failure, I think this is ok to be merged. I will open another PR for the remaining issue when I'm able to get to it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7764: ARROW-9390: [C++][Followup] Add underscores to is* string functions
wesm opened a new pull request #7764: URL: https://github.com/apache/arrow/pull/7764 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7734: ARROW-8480: [Rust] Use NonNull well aligned pointer as Unique reference
wesm closed pull request #7734: URL: https://github.com/apache/arrow/pull/7734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
github-actions[bot] commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-658400582 Revision: b1f53bb0a92a11eb280ff515438882b0decbf172 Submitted crossbow builds: [ursa-labs/crossbow @ actions-420](https://github.com/ursa-labs/crossbow/branches/all?query=actions-420) |Task|Status| ||--| |test-conda-r-4.0|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-420-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-420-github-test-conda-r-4.0)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7741: ARROW-9449: [R] Strip arrow.so
nealrichardson closed pull request #7741: URL: https://github.com/apache/arrow/pull/7741 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7762: ARROW-8650: [Rust] [Website] Add documentation to Arrow website
nealrichardson closed pull request #7762: URL: https://github.com/apache/arrow/pull/7762 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default
jorisvandenbossche commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658396423 Note that the new option with the datasets API only dictionary encodes string partition fields, and not integer partition field. So it would still not keep exactly the same behaviour as we had before .. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-658396393 @github-actions crossbow submit test-conda-r-4.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7746: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes
wesm commented on pull request #7746: URL: https://github.com/apache/arrow/pull/7746#issuecomment-658395334 AFAICT this can be merged? The timestamp related problems will require follow up work This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7740: ARROW-9447 [Rust][DataFusion] Made ScalarUDF (Send + Sync)
wesm closed pull request #7740: URL: https://github.com/apache/arrow/pull/7740 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7756: ARROW-9458: [Python] Release GIL in ScanTask.execute
wesm closed pull request #7756: URL: https://github.com/apache/arrow/pull/7756 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7761: ARROW-9470: [CI][Java] Run Maven in parallel
wesm closed pull request #7761: URL: https://github.com/apache/arrow/pull/7761 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658393753 Ah I see that you're adding Python changes. I fixed the lint problems here so be sure to rebase your changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7741: ARROW-9449: [R] Strip arrow.so
github-actions[bot] commented on pull request #7741: URL: https://github.com/apache/arrow/pull/7741#issuecomment-658393103 Revision: 343e87d2022a4c43b2e894a2b3d24a61f29f477e Submitted crossbow builds: [ursa-labs/crossbow @ actions-419](https://github.com/ursa-labs/crossbow/branches/all?query=actions-419) |Task|Status| ||--| |test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-419-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-419-azure-test-r-rocker-r-base-latest)| |test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-419-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-419-azure-test-r-rstudio-r-base-3.6-opensuse42)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so
nealrichardson commented on pull request #7741: URL: https://github.com/apache/arrow/pull/7741#issuecomment-658392235 @github-actions crossbow submit test-r-rstudio-r-base-3.6-opensuse42 test-r-rocker-r-base-latest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so
nealrichardson commented on pull request #7741: URL: https://github.com/apache/arrow/pull/7741#issuecomment-658392032 The two crossbow failures (other than conda-r, which we're ignoring) appear to be flakes, but I'll re-run them to be extra sure. Re-review appreciated on the latest change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7748: ARROW-9388: [C++] Division kernels
bkietz commented on a change in pull request #7748: URL: https://github.com/apache/arrow/pull/7748#discussion_r454610972 ## File path: cpp/src/arrow/compute/kernels/codegen_internal.h ## @@ -675,11 +704,80 @@ struct ScalarBinary { } }; +// An alternative to ScalarBinary that Applies a scalar operation on only the +// not-null values of arrays. +template +struct ScalarBinaryNotNull { + using OUT = typename GetOutputType::T; + + static void ArrayArray(KernelContext* ctx, const ArrayData& arg0, const ArrayData& arg1, + Datum* out) { +ArrayIterator arg0_it(arg0); +ArrayIterator arg1_it(arg1); +OutputAdapter::WriteOnlyValid( +ctx, out, [&]() -> OUT { return Op::template Call(ctx, arg0_it(), arg1_it()); }, +[&]() { + ARROW_UNUSED(arg0_it()); + ARROW_UNUSED(arg1_it()); +}); + } + + static void ArrayScalar(KernelContext* ctx, const ArrayData& arg0, const Scalar& arg1, + Datum* out) { +ArrayIterator arg0_it(arg0); +auto arg1_val = UnboxScalar::Unbox(arg1); +OutputAdapter::WriteOnlyValid( +ctx, out, [&]() -> OUT { return Op::template Call(ctx, arg0_it(), arg1_val); }, +[&]() { ARROW_UNUSED(arg0_it()); }); + } + + static void ScalarArray(KernelContext* ctx, const Scalar& arg0, const ArrayData& arg1, + Datum* out) { +auto arg0_val = UnboxScalar::Unbox(arg0); +ArrayIterator arg1_it(arg1); +OutputAdapter::WriteOnlyValid( +ctx, out, [&]() -> OUT { return Op::template Call(ctx, arg0_val, arg1_it()); }, +[&]() { ARROW_UNUSED(arg1_it()); }); + } + + static void ScalarScalar(KernelContext* ctx, const Scalar& arg0, const Scalar& arg1, + Datum* out) { +if (out->scalar()->is_valid) { + auto arg0_val = UnboxScalar::Unbox(arg0); + auto arg1_val = UnboxScalar::Unbox(arg1); + BoxScalar::Box(Op::template Call(ctx, arg0_val, arg1_val), + out->scalar().get()); +} + } + + static void Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) { +if (batch[0].kind() == Datum::ARRAY) { + if (batch[1].kind() == Datum::ARRAY) { +return ArrayArray(ctx, *batch[0].array(), *batch[1].array(), out); + } else { +return ArrayScalar(ctx, *batch[0].array(), *batch[1].scalar(), out); + } +} else { + if (batch[1].kind() == Datum::ARRAY) { +// e.g. if we were doing scalar < array, we flip and do array >= scalar Review comment: I think this comment is not applicable ```suggestion ``` ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -130,6 +130,19 @@ Result Multiply(const Datum& left, const Datum& right, ArithmeticOptions options = ArithmeticOptions(), ExecContext* ctx = NULLPTR); +/// \brief Divide two values. Array values must be the same length. If either +/// factor is null the result will be null. Review comment: Please describe division-by-zero behavior ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -219,48 +219,77 @@ struct MultiplyChecked { } }; +struct Divide { + template + static constexpr enable_if_floating_point Call(KernelContext*, T left, T right) { +return left / right; + } + + template + static constexpr enable_if_integer Call(KernelContext*, T left, T right) { +return left / right; + } +}; + +struct DivideChecked { + template + static enable_if_integer Call(KernelContext* ctx, T left, T right) { +if (right == 0) { + ctx->SetStatus(Status::Invalid("divide by 0")); + return right; +} +return Divide::Call(ctx, left, right); + } + + template + static constexpr enable_if_floating_point Call(KernelContext*, T left, T right) { +return left * right; Review comment: ```suggestion return left / right; ``` ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -130,6 +130,19 @@ Result Multiply(const Datum& left, const Datum& right, ArithmeticOptions options = ArithmeticOptions(), ExecContext* ctx = NULLPTR); +/// \brief Divide two values. Array values must be the same length. If either +/// factor is null the result will be null. +/// +/// \param[in] left the dividend +/// \param[in] right the divisor +/// \param[in] options arithmetic options (overflow handling), optional Review comment: Division does not overflow unless the dividend is INT_MIN and the divisor is -1; IIUC `overflow_handling` is being used to indicate whether divide-by-zero should raise an error. If so please amend the comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns
jorisvandenbossche commented on a change in pull request #7272: URL: https://github.com/apache/arrow/pull/7272#discussion_r454607147 ## File path: cpp/src/arrow/table.cc ## @@ -362,6 +362,23 @@ Result> Table::RenameColumns( return Table::Make(::arrow::schema(std::move(fields)), std::move(columns), num_rows()); } +Result> Table::SelectColumns( +const std::vector& indices) const { + int n = static_cast(indices.size()); + + std::vector> columns(n); + std::vector> fields(n); + for (int i = 0; i < n; i++) { +int pos = indices[i]; Review comment: Added a boundscheck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7763: ARROW-9472: [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed
nealrichardson opened a new pull request #7763: URL: https://github.com/apache/arrow/pull/7763 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7763: ARROW-9472: [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed
github-actions[bot] commented on pull request #7763: URL: https://github.com/apache/arrow/pull/7763#issuecomment-658377740 https://issues.apache.org/jira/browse/ARROW-9472 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7762: ARROW-8650: [Rust] [Website] Add documentation to Arrow website
github-actions[bot] commented on pull request #7762: URL: https://github.com/apache/arrow/pull/7762#issuecomment-658349165 https://issues.apache.org/jira/browse/ARROW-8650 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7762: ARROW-8650: [Rust] [Website] Add documentation to Arrow website
nealrichardson opened a new pull request #7762: URL: https://github.com/apache/arrow/pull/7762 Also adds the links for the other missing languages, as are being added to the main website menu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names
pitrou commented on a change in pull request #7755: URL: https://github.com/apache/arrow/pull/7755#discussion_r454553900 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) { MakeUnaryStringBatchKernel("ascii_upper", registry); MakeUnaryStringBatchKernel("ascii_lower", registry); - AddUnaryStringPredicate("binary_isascii", registry); + AddUnaryStringPredicate("string_isascii", registry); Review comment: Also `ascii_is_alnum`, etc. ;-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names
wesm commented on a change in pull request #7755: URL: https://github.com/apache/arrow/pull/7755#discussion_r454553258 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) { MakeUnaryStringBatchKernel("ascii_upper", registry); MakeUnaryStringBatchKernel("ascii_lower", registry); - AddUnaryStringPredicate("binary_isascii", registry); + AddUnaryStringPredicate("string_isascii", registry); Review comment: I can make this change today unless someone else wants to This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7741: ARROW-9449: [R] Strip arrow.so
github-actions[bot] commented on pull request #7741: URL: https://github.com/apache/arrow/pull/7741#issuecomment-658334397 Revision: 343e87d2022a4c43b2e894a2b3d24a61f29f477e Submitted crossbow builds: [ursa-labs/crossbow @ actions-418](https://github.com/ursa-labs/crossbow/branches/all?query=actions-418) |Task|Status| ||--| |homebrew-r-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-418-travis-homebrew-r-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |test-conda-r-4.0|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-418-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-418-github-test-conda-r-4.0)| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-418-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-418-github-test-r-linux-as-cran)| |test-r-rhub-ubuntu-gcc-release|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rhub-ubuntu-gcc-release)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rhub-ubuntu-gcc-release)| |test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rocker-r-base-latest)| |test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-bionic)| |test-r-rstudio-r-base-3.6-centos6|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-centos6)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-centos6)| |test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-centos8)| |test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse15)| |test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-r-rstudio-r-base-3.6-opensuse42)| |test-ubuntu-18.04-r-sanitizer|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-418-azure-test-ubuntu-18.04-r-sanitizer)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-418-azure-test-ubuntu-18.04-r-sanitizer)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names
pitrou commented on a change in pull request #7755: URL: https://github.com/apache/arrow/pull/7755#discussion_r454550954 ## File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc ## @@ -61,9 +61,9 @@ static void IsAlphaNumericAscii(benchmark::State& state) { UnaryStringBenchmark(state, "ascii_isalnum"); } -static void BinaryContainsExact(benchmark::State& state) { - BinaryContainsExactOptions options("abac"); - UnaryStringBenchmark(state, "binary_contains_exact", ); +static void MatchSubstring(benchmark::State& state) { + MatchSubstringOptions options("abac"); + UnaryStringBenchmark(state, "match_substring", ); Review comment: I don't think so. Personally, I find prefixes a bit pointless or even distracting. (or, here, plain misleading, since "binary_contains_exact" didn't work on binary arrays, only string arrays...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names
pitrou commented on a change in pull request #7755: URL: https://github.com/apache/arrow/pull/7755#discussion_r454551146 ## File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc ## @@ -61,9 +61,9 @@ static void IsAlphaNumericAscii(benchmark::State& state) { UnaryStringBenchmark(state, "ascii_isalnum"); } -static void BinaryContainsExact(benchmark::State& state) { - BinaryContainsExactOptions options("abac"); - UnaryStringBenchmark(state, "binary_contains_exact", ); +static void MatchSubstring(benchmark::State& state) { + MatchSubstringOptions options("abac"); + UnaryStringBenchmark(state, "match_substring", ); Review comment: Perhaps someone else wants to think about a convention? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove commented on pull request #7751: ARROW-9461: [Rust] Fixed error in reading Date32 and Date64.
andygrove commented on pull request #7751: URL: https://github.com/apache/arrow/pull/7751#issuecomment-658333275 @jorgecarleitao Could you add a unit test? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names
pitrou commented on a change in pull request #7755: URL: https://github.com/apache/arrow/pull/7755#discussion_r454549799 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) { MakeUnaryStringBatchKernel("ascii_upper", registry); MakeUnaryStringBatchKernel("ascii_lower", registry); - AddUnaryStringPredicate("binary_isascii", registry); + AddUnaryStringPredicate("string_isascii", registry); Review comment: That's a good point. Should we make a followup PR? (if it's me, it's only tomorrow) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so
nealrichardson commented on pull request #7741: URL: https://github.com/apache/arrow/pull/7741#issuecomment-658332593 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #7760: ARROW-9390: [Doc] Add missing file
pitrou closed pull request #7760: URL: https://github.com/apache/arrow/pull/7760 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime
pitrou closed pull request #7753: URL: https://github.com/apache/arrow/pull/7753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7761: ARROW-9470: [CI][Java] Run Maven in parallel
github-actions[bot] commented on pull request #7761: URL: https://github.com/apache/arrow/pull/7761#issuecomment-658330967 https://issues.apache.org/jira/browse/ARROW-9470 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7760: ARROW-9390: [Doc] Add missing file
github-actions[bot] commented on pull request #7760: URL: https://github.com/apache/arrow/pull/7760#issuecomment-658330966 https://issues.apache.org/jira/browse/ARROW-9390 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime
pitrou commented on pull request #7753: URL: https://github.com/apache/arrow/pull/7753#issuecomment-658329952 JPype build is green, AppVeyor is [green](https://ci.appveyor.com/project/pitrou/arrow). Will merge :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou opened a new pull request #7761: ARROW-9470: [CI][Java] Run Maven in parallel
pitrou opened a new pull request #7761: URL: https://github.com/apache/arrow/pull/7761 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou opened a new pull request #7760: ARROW-9390: [Doc] Add missing file
pitrou opened a new pull request #7760: URL: https://github.com/apache/arrow/pull/7760 Followup from PR #7755: a new doc file wasn't checked in, and Sphinx only emits a warning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
github-actions[bot] commented on pull request #7753: URL: https://github.com/apache/arrow/pull/7753#issuecomment-658317095 Revision: 4752024fc2fbcb5b8e058d9f9fa8d2864c5c74ba Submitted crossbow builds: [ursa-labs/crossbow @ actions-417](https://github.com/ursa-labs/crossbow/branches/all?query=actions-417) |Task|Status| ||--| |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-417-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-417-github-test-conda-python-3.8-jpype)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
pitrou commented on pull request #7753: URL: https://github.com/apache/arrow/pull/7753#issuecomment-658316233 @github-actions crossbow submit test-conda-python-3.8-jpype This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7741: ARROW-9449: [R] Strip arrow.so
nealrichardson commented on pull request #7741: URL: https://github.com/apache/arrow/pull/7741#issuecomment-658310822 On one of the linux images I've been testing locally, as well as on my mac, the R makevars added in that commit are setting `CXXFLAGS="-g -O2"`. Given that, it probably makes sense not to pass those flags to the libarrow build. But I still think we should strip because I can't control the fact that R is including that when it compiles the R bindings--that is, I could edit the makevars on my own system, but the package installation can't override whatever makevars R has on other people's systems. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #7759: ARROW-7831: [Java] Fix build error from #6402
pitrou closed pull request #7759: URL: https://github.com/apache/arrow/pull/7759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran edited a comment on pull request #7748: ARROW-9388: [C++] Division kernels
paddyhoran edited a comment on pull request #7748: URL: https://github.com/apache/arrow/pull/7748#issuecomment-658304924 We have this [implemented](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/arithmetic.rs#L289) with [SIMD](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/arithmetic.rs#L162) (which was tricky) on the Rust side. We currently return a `Result` for division by zero. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran commented on pull request #7748: ARROW-9388: [C++] Division kernels
paddyhoran commented on pull request #7748: URL: https://github.com/apache/arrow/pull/7748#issuecomment-658304924 We have this implemented with [SIMD](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/arithmetic.rs#L162) (which was tricky) on the Rust side. We currently return a `Result` for division by zero. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
pitrou edited a comment on pull request #7753: URL: https://github.com/apache/arrow/pull/7753#issuecomment-658296432 Rebased. There were some Java changes in-between, hopefully they won't interfere. Edit: of course they do... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7759: ARROW-7831: [Java] Fix build error from #6402
github-actions[bot] commented on pull request #7759: URL: https://github.com/apache/arrow/pull/7759#issuecomment-658304829 https://issues.apache.org/jira/browse/ARROW-7831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
pitrou commented on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-658300664 Ok, @lidavidm did it in https://github.com/apache/arrow/pull/7759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm opened a new pull request #7759: NOJIRA: [Java] Fix build error from #6402
lidavidm opened a new pull request #7759: URL: https://github.com/apache/arrow/pull/7759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
pitrou commented on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-658300014 @rymurr Are you available to give this a quick fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
pitrou edited a comment on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-658299827 This broke the Java builds: https://github.com/apache/arrow/actions/runs/168952655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
pitrou commented on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-658299827 This broke the builds: https://github.com/apache/arrow/actions/runs/168952655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #7755: ARROW-9390: [C++][Doc] Review compute function names
nealrichardson commented on a change in pull request #7755: URL: https://github.com/apache/arrow/pull/7755#discussion_r454506911 ## File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc ## @@ -61,9 +61,9 @@ static void IsAlphaNumericAscii(benchmark::State& state) { UnaryStringBenchmark(state, "ascii_isalnum"); } -static void BinaryContainsExact(benchmark::State& state) { - BinaryContainsExactOptions options("abac"); - UnaryStringBenchmark(state, "binary_contains_exact", ); +static void MatchSubstring(benchmark::State& state) { + MatchSubstringOptions options("abac"); + UnaryStringBenchmark(state, "match_substring", ); Review comment: Are there clear criteria for when the function gets a `binary_` or `string_` prefix and when it doesn't? Related, if we haven't documented the naming conventions (haven't seen it yet but I'll keep reading), we should so that future developers/selves know what to call new functions. ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -944,7 +944,7 @@ void RegisterScalarStringAscii(FunctionRegistry* registry) { MakeUnaryStringBatchKernel("ascii_upper", registry); MakeUnaryStringBatchKernel("ascii_lower", registry); - AddUnaryStringPredicate("binary_isascii", registry); + AddUnaryStringPredicate("string_isascii", registry); Review comment: Why isn't this `string_is_ascii`? It seems we prefer `is_x` to `isx` (cf. `isin` -> `is_in` in this PR). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
pitrou commented on pull request #7753: URL: https://github.com/apache/arrow/pull/7753#issuecomment-658296432 Rebased. There were some Java changes in-between, hopefully they won't interfere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
pitrou commented on a change in pull request #7753: URL: https://github.com/apache/arrow/pull/7753#discussion_r454500127 ## File path: ci/scripts/python_test.sh ## @@ -29,4 +29,4 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH} # Enable some checks inside Python itself export PYTHONDEVMODE=1 -pytest -r s --pyargs pyarrow +pytest -r s --pyargs pyarrow.tests.test_jvm Review comment: Yes :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
github-actions[bot] commented on pull request #7753: URL: https://github.com/apache/arrow/pull/7753#issuecomment-658292321 Revision: 28f6500515fee577dc17f969a819ef01830ffdcd Submitted crossbow builds: [ursa-labs/crossbow @ actions-416](https://github.com/ursa-labs/crossbow/branches/all?query=actions-416) |Task|Status| ||--| |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-416-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-416-github-test-conda-python-3.8-jpype)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rymurr commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
rymurr commented on a change in pull request #7753: URL: https://github.com/apache/arrow/pull/7753#discussion_r454498500 ## File path: ci/scripts/python_test.sh ## @@ -29,4 +29,4 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH} # Enable some checks inside Python itself export PYTHONDEVMODE=1 -pytest -r s --pyargs pyarrow +pytest -r s --pyargs pyarrow.tests.test_jvm Review comment: Did this sneak in on accident? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
pitrou commented on a change in pull request #7753: URL: https://github.com/apache/arrow/pull/7753#discussion_r454494904 ## File path: python/pyarrow/jvm.py ## @@ -28,24 +28,38 @@ import pyarrow as pa -def jvm_buffer(arrowbuf): +class _JvmBufferNanny: +""" +An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying +memory alive. +""" + +def __init__(self, jvm_buf): +self.jvm_buf = jvm_buf +self.jvm_buf.retain() + +def __del__(self): +self.jvm_buf.release() + + +def jvm_buffer(jvm_buf): """ Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf Parameters -- -arrowbuf: org.apache.arrow.memory.ArrowBuf +jvm_buf: org.apache.arrow.memory.ArrowBuf Arrow Buffer representation on the JVM. Returns --- pyarrow.Buffer Python Buffer that references the JVM memory. """ -address = arrowbuf.memoryAddress() -size = arrowbuf.capacity() -return pa.foreign_buffer(address, size, base=arrowbuf) +address = jvm_buf.memoryAddress() Review comment: I'm trying to improve this, hold on. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests
wesm commented on a change in pull request #7753: URL: https://github.com/apache/arrow/pull/7753#discussion_r454493669 ## File path: python/pyarrow/jvm.py ## @@ -28,24 +28,38 @@ import pyarrow as pa -def jvm_buffer(arrowbuf): +class _JvmBufferNanny: +""" +An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying +memory alive. +""" + +def __init__(self, jvm_buf): +self.jvm_buf = jvm_buf +self.jvm_buf.retain() + +def __del__(self): +self.jvm_buf.release() + + +def jvm_buffer(jvm_buf): """ Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf Parameters -- -arrowbuf: org.apache.arrow.memory.ArrowBuf +jvm_buf: org.apache.arrow.memory.ArrowBuf Arrow Buffer representation on the JVM. Returns --- pyarrow.Buffer Python Buffer that references the JVM memory. """ -address = arrowbuf.memoryAddress() -size = arrowbuf.capacity() -return pa.foreign_buffer(address, size, base=arrowbuf) +address = jvm_buf.memoryAddress() Review comment: I'm mostly trying to understand where and when this buffer will be destroyed. In any case it's outside the scope of this PR so we can let someone more invested in this code deal with it later This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org