[GitHub] [arrow] emkornfield commented on pull request #7578: ARROW-9264: [C++][Parquet] Refactor and modernize schema conversion code

2020-06-28 Thread GitBox


emkornfield commented on pull request #7578:
URL: https://github.com/apache/arrow/pull/7578#issuecomment-650898648


   resolved the change and redid the removal of the variable cleanup.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7547: ARROW-8950: [C++] Avoid HEAD when possible in S3 filesystem

2020-06-28 Thread GitBox


wesm closed pull request #7547:
URL: https://github.com/apache/arrow/pull/7547


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7578: ARROW-9264: [C++][Parquet] Refactor and modernize schema conversion code

2020-06-28 Thread GitBox


wesm commented on pull request #7578:
URL: https://github.com/apache/arrow/pull/7578#issuecomment-650893956


   Great, I'll review tomorrow. The PR I just merged 
https://github.com/apache/arrow/commit/cca2db1cfb12bb12a4c018dcf60796352b27f1e2 
caused a small conflict so you may need to manually rebase that small change



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7577: ARROW-8980: [Python] Ensure that ARROW:schema metadata key is scrubbed when converting Parquet schema back to Arrow schema

2020-06-28 Thread GitBox


wesm closed pull request #7577:
URL: https://github.com/apache/arrow/pull/7577


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7577: ARROW-8980: [Python] Ensure that ARROW:schema metadata key is scrubbed when converting Parquet schema back to Arrow schema

2020-06-28 Thread GitBox


wesm commented on pull request #7577:
URL: https://github.com/apache/arrow/pull/7577#issuecomment-650893647


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7578: ARROW-9264: [C++][Parquet] Refactor and modernize schema conversion code

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7578:
URL: https://github.com/apache/arrow/pull/7578#issuecomment-650889205


   https://issues.apache.org/jira/browse/ARROW-9264



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7578: ARROW-9264: [C++][Parquet] Refactor and modernize schema conversion code

2020-06-28 Thread GitBox


emkornfield commented on pull request #7578:
URL: https://github.com/apache/arrow/pull/7578#issuecomment-650885965


   CC @wesm 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield opened a new pull request #7578: ARROW-9264: [C++][Parquet] Refactor and modernize schema conversion code

2020-06-28 Thread GitBox


emkornfield opened a new pull request #7578:
URL: https://github.com/apache/arrow/pull/7578


   Split this off from ARROW-8493 because enough code moved around to
   warrant a review without other functional changes.
   
   - Rename max_ to either current_ or removed entirely
   - During SchemaField generation remove one check for grouped
 child fields.
   - Add a field that will be required for generalized resolution
 ARROW-8493
   - Use Result<> instead of output parameter for type conversions.
   - Move code from reader_interal to schema.cc to align function declarations 
with header file.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-28 Thread GitBox


bkietz commented on pull request #7536:
URL: https://github.com/apache/arrow/pull/7536#issuecomment-650885378


   Int32 indices are now used whatever the dictionary size



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-28 Thread GitBox


jianxind commented on pull request #7576:
URL: https://github.com/apache/arrow/pull/7576#issuecomment-650877595


   > We should probably not share such parameters outside of the scope of a 
single benchmark executable, otherwise such changes make the benchmark results 
before/after not comparable
   
   Thanks, I only update the size of compute-aggregate-benchmark to 1M now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7535:
URL: https://github.com/apache/arrow/pull/7535#issuecomment-650876518


   https://issues.apache.org/jira/browse/ARROW-9222



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7574: ARROW-9233: [C++] Add NullType code paths for is_valid, is_null kernels

2020-06-28 Thread GitBox


wesm closed pull request #7574:
URL: https://github.com/apache/arrow/pull/7574


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7574: ARROW-9233: [C++] Add NullType code paths for is_valid, is_null kernels

2020-06-28 Thread GitBox


wesm commented on pull request #7574:
URL: https://github.com/apache/arrow/pull/7574#issuecomment-650873889


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7577: ARROW-8980: [Python] Ensure that ARROW:schema metadata key is scrubbed when converting Parquet schema back to Arrow schema

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7577:
URL: https://github.com/apache/arrow/pull/7577#issuecomment-650872224


   https://issues.apache.org/jira/browse/ARROW-8980



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7576: ARROW-9263: [C++] Promote RegressionSetArgs size to L2.

2020-06-28 Thread GitBox


wesm commented on pull request #7576:
URL: https://github.com/apache/arrow/pull/7576#issuecomment-650871322


   We should probably not share such parameters outside of the scope of a 
single benchmark executable, otherwise such changes make the benchmark results 
before/after not comparable



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7577: ARROW-8980: [Python] Ensure that ARROW:schema metadata key is scrubbed when converting Parquet schema back to Arrow schema

2020-06-28 Thread GitBox


wesm opened a new pull request #7577:
URL: https://github.com/apache/arrow/pull/7577


   This was already being worked around in some unit tests... 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7575: ARROW-8671: [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files

2020-06-28 Thread GitBox


wesm closed pull request #7575:
URL: https://github.com/apache/arrow/pull/7575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7575: ARROW-8671: [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files

2020-06-28 Thread GitBox


wesm commented on pull request #7575:
URL: https://github.com/apache/arrow/pull/7575#issuecomment-650870541


   +1, merging to fix master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7557: ARROW-9251: [C++] Relocate integration testing JSON code implementation to src/arrow/testing

2020-06-28 Thread GitBox


wesm commented on pull request #7557:
URL: https://github.com/apache/arrow/pull/7557#issuecomment-650866076


   +1. The ASAN/UBSAN failure is fixed by 
https://github.com/apache/arrow/pull/7575



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7557: ARROW-9251: [C++] Relocate integration testing JSON code implementation to src/arrow/testing

2020-06-28 Thread GitBox


wesm closed pull request #7557:
URL: https://github.com/apache/arrow/pull/7557


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7574: ARROW-9233: [C++] Add NullType code paths for is_valid, is_null kernels

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7574:
URL: https://github.com/apache/arrow/pull/7574#issuecomment-650865623


   https://issues.apache.org/jira/browse/ARROW-9233



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7575: ARROW-8671: [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7575:
URL: https://github.com/apache/arrow/pull/7575#issuecomment-650865625


   https://issues.apache.org/jira/browse/ARROW-8671



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind closed pull request #7314: ARROW-8996: [C++] AVX2/AVX512 runtime support for aggregate sum kernel

2020-06-28 Thread GitBox


jianxind closed pull request #7314:
URL: https://github.com/apache/arrow/pull/7314


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7576: ARROW-9263: [C++] Promote RegressionSetArgs size to L2.

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7576:
URL: https://github.com/apache/arrow/pull/7576#issuecomment-650865622


   https://issues.apache.org/jira/browse/ARROW-9263



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] AVX2/AVX512 runtime support for aggregate sum kernel

2020-06-28 Thread GitBox


jianxind commented on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-650865605


   Close this one as reference of intrinsic. I will try to work out a new 
approach without intrinsic.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind opened a new pull request #7576: ARROW-9263: [C++] Promote RegressionSetArgs size to L2.

2020-06-28 Thread GitBox


jianxind opened a new pull request #7576:
URL: https://github.com/apache/arrow/pull/7576


   The 0.01% null probability on L1 size(usually 32k) generate all true values 
for double/float type, set size to L2.
   
   Signed-off-by: Frank Du 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7575: ARROW-8671: [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files

2020-06-28 Thread GitBox


wesm opened a new pull request #7575:
URL: https://github.com/apache/arrow/pull/7575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7574: ARROW-9233: [C++] Add NullType code paths for is_valid, is_null kernels

2020-06-28 Thread GitBox


wesm opened a new pull request #7574:
URL: https://github.com/apache/arrow/pull/7574


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-28 Thread GitBox


wesm commented on pull request #7571:
URL: https://github.com/apache/arrow/pull/7571#issuecomment-650857082


   This has an ASAN/UBSAN failure. I will fix within an hour



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7573: ARROW-9262: [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI

2020-06-28 Thread GitBox


kou closed pull request #7573:
URL: https://github.com/apache/arrow/pull/7573


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7573: ARROW-9262: [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI

2020-06-28 Thread GitBox


kou commented on pull request #7573:
URL: https://github.com/apache/arrow/pull/7573#issuecomment-650837500


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-28 Thread GitBox


nealrichardson commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r446706140



##
File path: r/src/array_from_vector.cpp
##
@@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter {
   }
 };
 
+template 
+class BinaryVectorConverter : public VectorConverter {
+ public:
+  ~BinaryVectorConverter() {}
+
+  Status Init(ArrayBuilder* builder) {
+typed_builder_ = checked_cast(builder);
+return Status::OK();
+  }
+
+  Status Ingest(SEXP obj) {
+ARROW_RETURN_IF(TYPEOF(obj) != VECSXP, Status::RError("Expecting a list"));
+R_xlen_t n = XLENGTH(obj);
+
+// Reserve enough space before appending
+int64_t size = 0;
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP obj_i = VECTOR_ELT(obj, i);
+  if (!Rf_isNull(obj_i)) {
+ARROW_RETURN_IF(TYPEOF(obj_i) != RAWSXP,
+Status::RError("Expecting a raw vector"));
+size += XLENGTH(obj_i);
+  }
+}
+RETURN_NOT_OK(typed_builder_->Reserve(size));
+
+// append
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP obj_i = VECTOR_ELT(obj, i);
+  if (Rf_isNull(obj_i)) {
+RETURN_NOT_OK(typed_builder_->AppendNull());
+  } else {
+RETURN_NOT_OK(typed_builder_->Append(RAW(obj_i), XLENGTH(obj_i)));
+  }
+}
+return Status::OK();
+  }
+
+  Status GetResult(std::shared_ptr* result) {
+return typed_builder_->Finish(result);
+  }
+
+ private:
+  Builder* typed_builder_;
+};
+
+template 
+class StringVectorConverter : public VectorConverter {
+ public:
+  ~StringVectorConverter() {}
+
+  Status Init(ArrayBuilder* builder) {
+typed_builder_ = checked_cast(builder);
+return Status::OK();
+  }
+
+  Status Ingest(SEXP obj) {
+ARROW_RETURN_IF(TYPEOF(obj) != STRSXP,
+Status::RError("Expecting a character vector"));
+R_xlen_t n = XLENGTH(obj);
+
+// Reserve enough space before appending
+int64_t size = 0;
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP string_i = STRING_ELT(obj, i);
+  if (string_i != NA_STRING) {
+size += XLENGTH(string_i);
+  }
+}
+RETURN_NOT_OK(typed_builder_->Reserve(size));
+
+// append
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP string_i = STRING_ELT(obj, i);
+  if (string_i == NA_STRING) {
+RETURN_NOT_OK(typed_builder_->AppendNull());
+  } else {
+RETURN_NOT_OK(typed_builder_->Append(CHAR(string_i), 
XLENGTH(string_i)));

Review comment:
   This new builder is breaking the (new) UTF-8 tests. The previous 
converter code is 
https://github.com/apache/arrow/blob/master/r/src/array_from_vector.cpp#L147-L171
 and it is apparently no longer being called. 
   
   I wonder if this whole code block isn't possible right now as is. The 
"Reserve enough space before appending" block would also need to convert to 
UTF-8 in order to get the size right, and I wonder if converting/asserting 
everything to UTF-8 twice would outweigh the benefits of Reserving space. Or 
maybe we can take the size as is and overcommit bytes?
   
   Tangentially related, would a "reserve" check like this be the way to solve 
https://issues.apache.org/jira/browse/ARROW-3308, where we need to switch to 
Large types if there's more than 2GB?
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7561: ARROW-9254: [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for accessing primitive sca

2020-06-28 Thread GitBox


wesm closed pull request #7561:
URL: https://github.com/apache/arrow/pull/7561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-28 Thread GitBox


wesm commented on pull request #7571:
URL: https://github.com/apache/arrow/pull/7571#issuecomment-650830163


   Oops, I didn't mean to merge this patch, sorry! Please review it and I will 
address any code reviews as follow up



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-28 Thread GitBox


wesm closed pull request #7571:
URL: https://github.com/apache/arrow/pull/7571


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-28 Thread GitBox


wesm commented on pull request #7571:
URL: https://github.com/apache/arrow/pull/7571#issuecomment-650829722


   I confirmed that I can read compressed files (including compressed 
dictionaries) generated from master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7561: ARROW-9254: [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for accessing primiti

2020-06-28 Thread GitBox


wesm commented on pull request #7561:
URL: https://github.com/apache/arrow/pull/7561#issuecomment-650829682


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7572: ARROW-9260: [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04

2020-06-28 Thread GitBox


wesm commented on pull request #7572:
URL: https://github.com/apache/arrow/pull/7572#issuecomment-650828492


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-28 Thread GitBox


kszucs commented on pull request #7439:
URL: https://github.com/apache/arrow/pull/7439#issuecomment-650826896


   Thanks @kou for fixing it!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7572: ARROW-9260: [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04

2020-06-28 Thread GitBox


kou closed pull request #7572:
URL: https://github.com/apache/arrow/pull/7572


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7572: ARROW-9260: [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04

2020-06-28 Thread GitBox


kou commented on pull request #7572:
URL: https://github.com/apache/arrow/pull/7572#issuecomment-650825705


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7573: ARROW-9262: [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7573:
URL: https://github.com/apache/arrow/pull/7573#issuecomment-650823047


   Revision: ce166dbd01320d6f4cf3eba9f7c961e97dbba384
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-364](https://github.com/ursa-labs/crossbow/branches/all?query=actions-364)
   
   |Task|Status|
   ||--|
   
|centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-364-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7573: ARROW-9262: [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI

2020-06-28 Thread GitBox


kou commented on pull request #7573:
URL: https://github.com/apache/arrow/pull/7573#issuecomment-650822793


   @github-actions crossbow submit -g linux-arm64
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7573: ARROW-9262: [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7573:
URL: https://github.com/apache/arrow/pull/7573#issuecomment-650822868


   https://issues.apache.org/jira/browse/ARROW-9262



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7573: [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI

2020-06-28 Thread GitBox


kou opened a new pull request #7573:
URL: https://github.com/apache/arrow/pull/7573


   We got the following error with Ubuntu 20.04:
   
   gpg: Fatal: can't disable core dumps: Operation not permitted



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-28 Thread GitBox


kou commented on pull request #7439:
URL: https://github.com/apache/arrow/pull/7439#issuecomment-650822114


   The Travis CI failures disabled by https://github.com/apache/arrow/pull/7552 
is caused by this.
   
   The ARM64v8 CI failure disabled by https://github.com/apache/arrow/pull/7570 
is also caused by this.
   
   https://github.com/apache/arrow/pull/7572 will fix the problems.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7572: ARROW-9260: [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7572:
URL: https://github.com/apache/arrow/pull/7572#issuecomment-650821692


   https://issues.apache.org/jira/browse/ARROW-9260



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7572: ARROW-9260: [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04

2020-06-28 Thread GitBox


kou opened a new pull request #7572:
URL: https://github.com/apache/arrow/pull/7572


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7571:
URL: https://github.com/apache/arrow/pull/7571#issuecomment-650809427


   https://issues.apache.org/jira/browse/ARROW-8671



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm merged pull request #7570: ARROW-9260: [CI][TRIAGE] Disable self-hosted builds until ARM64v8 build can be fixed

2020-06-28 Thread GitBox


wesm merged pull request #7570:
URL: https://github.com/apache/arrow/pull/7570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7570: ARROW-9260: [CI][TRIAGE] Disable self-hosted builds until ARM64v8 build can be fixed

2020-06-28 Thread GitBox


wesm commented on pull request #7570:
URL: https://github.com/apache/arrow/pull/7570#issuecomment-650804456


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7570: ARROW-9260: [CI][TRIAGE] Disable self-hosted builds until ARM64v8 build can be fixed

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7570:
URL: https://github.com/apache/arrow/pull/7570#issuecomment-650804358


   https://issues.apache.org/jira/browse/ARROW-9260



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7569:
URL: https://github.com/apache/arrow/pull/7569#issuecomment-650804359


   https://issues.apache.org/jira/browse/ARROW-9152



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7570: ARROW-9260: [CI][TRIAGE] Disable self-hosted builds until ARM64v8 build can be fixed

2020-06-28 Thread GitBox


wesm opened a new pull request #7570:
URL: https://github.com/apache/arrow/pull/7570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-28 Thread GitBox


wesm commented on pull request #7569:
URL: https://github.com/apache/arrow/pull/7569#issuecomment-650803803


   Benchmarks on gcc-8
   
   ```
benchmark baselinecontender 
 change %  
counters
   13 FilterStringFilterNoNulls/1048576/01.242 GiB/sec6.766 GiB/sec 
  444.926{'iterations': 890, 'data null%': 0.0, 'mask null%': 0.0, 
'select%': 99.9}
   28 FilterStringFilterNoNulls/1048576/31.216 GiB/sec5.055 GiB/sec 
  315.816{'iterations': 877, 'data null%': 0.1, 'mask null%': 0.0, 
'select%': 99.9}
   8  FilterStringFilterNoNulls/1048576/61.087 GiB/sec2.512 GiB/sec 
  130.990{'iterations': 771, 'data null%': 1.0, 'mask null%': 0.0, 
'select%': 99.9}
   4 FilterStringFilterNoNulls/1048576/12  227.501 MiB/sec  500.272 MiB/sec 
  119.899   {'iterations': 160, 'data null%': 90.0, 'mask null%': 0.0, 
'select%': 99.9}
   23 FilterStringFilterNoNulls/1048576/9  902.994 MiB/sec1.362 GiB/sec 
   54.476   {'iterations': 622, 'data null%': 10.0, 'mask null%': 0.0, 
'select%': 99.9}
   24  FilterStringFilterWithNulls/1048576/12  235.786 MiB/sec  356.124 MiB/sec 
   51.037   {'iterations': 165, 'data null%': 90.0, 'mask null%': 5.0, 
'select%': 99.9}
   5FilterStringFilterWithNulls/1048576/3  996.664 MiB/sec1.361 GiB/sec 
   39.858{'iterations': 687, 'data null%': 0.1, 'mask null%': 5.0, 
'select%': 99.9}
   22   FilterStringFilterWithNulls/1048576/6  988.599 MiB/sec1.331 GiB/sec 
   37.918{'iterations': 685, 'data null%': 1.0, 'mask null%': 5.0, 
'select%': 99.9}
   15   FilterStringFilterWithNulls/1048576/9  913.433 MiB/sec1.205 GiB/sec 
   35.073   {'iterations': 634, 'data null%': 10.0, 'mask null%': 5.0, 
'select%': 99.9}
   7 FilterStringFilterNoNulls/1048576/13  219.235 MiB/sec  292.800 MiB/sec 
   33.555   {'iterations': 155, 'data null%': 90.0, 'mask null%': 0.0, 
'select%': 50.0}
   14FilterStringFilterNoNulls/1048576/101.169 GiB/sec1.559 GiB/sec 
   33.370   {'iterations': 838, 'data null%': 10.0, 'mask null%': 0.0, 
'select%': 50.0}
   25   FilterStringFilterWithNulls/1048576/01.062 GiB/sec1.366 GiB/sec 
   28.690{'iterations': 761, 'data null%': 0.0, 'mask null%': 5.0, 
'select%': 99.9}
   2  FilterStringFilterNoNulls/1048576/11.398 GiB/sec1.763 GiB/sec 
   26.088   {'iterations': 1007, 'data null%': 0.0, 'mask null%': 0.0, 
'select%': 50.0}
   11   FilterStringFilterWithNulls/1048576/41.233 GiB/sec1.521 GiB/sec 
   23.336{'iterations': 894, 'data null%': 0.1, 'mask null%': 5.0, 
'select%': 50.0}
   29   FilterStringFilterWithNulls/1048576/71.240 GiB/sec1.519 GiB/sec 
   22.426{'iterations': 881, 'data null%': 1.0, 'mask null%': 5.0, 
'select%': 50.0}
   9   FilterStringFilterWithNulls/1048576/101.177 GiB/sec1.370 GiB/sec 
   16.380   {'iterations': 753, 'data null%': 10.0, 'mask null%': 5.0, 
'select%': 50.0}
   26  FilterStringFilterWithNulls/1048576/13  215.181 MiB/sec  249.411 MiB/sec 
   15.907   {'iterations': 151, 'data null%': 90.0, 'mask null%': 5.0, 
'select%': 50.0}
   3  FilterStringFilterNoNulls/1048576/71.373 GiB/sec1.580 GiB/sec 
   15.117{'iterations': 983, 'data null%': 1.0, 'mask null%': 0.0, 
'select%': 50.0}
   10 FilterStringFilterNoNulls/1048576/41.427 GiB/sec1.607 GiB/sec 
   12.602{'iterations': 996, 'data null%': 0.1, 'mask null%': 0.0, 
'select%': 50.0}
   17   FilterStringFilterWithNulls/1048576/11.364 GiB/sec1.532 GiB/sec 
   12.343{'iterations': 950, 'data null%': 0.0, 'mask null%': 5.0, 
'select%': 50.0}
   20 FilterStringFilterNoNulls/1048576/8   19.656 GiB/sec   20.549 GiB/sec 
4.540   {'iterations': 14209, 'data null%': 1.0, 'mask null%': 0.0, 
'select%': 1.0}
   6   FilterStringFilterWithNulls/1048576/11   12.449 GiB/sec   12.500 GiB/sec 
0.407   {'iterations': 8994, 'data null%': 10.0, 'mask null%': 5.0, 
'select%': 1.0}
   0  FilterStringFilterNoNulls/1048576/5   20.884 GiB/sec   20.824 GiB/sec 
   -0.287   {'iterations': 14923, 'data null%': 0.1, 'mask null%': 0.0, 
'select%': 1.0}
   19FilterStringFilterNoNulls/1048576/11   18.171 GiB/sec   17.865 GiB/sec 
   -1.687  {'iterations': 12875, 'data null%': 10.0, 'mask null%': 0.0, 
'select%': 1.0}
   21  FilterStringFilterWithNulls/1048576/141.551 GiB/sec1.519 GiB/sec 
   -2.057   {'iterations': 1090, 'data null%': 90.0, 'mask null%': 5.0, 
'select%': 1.0}
   16 FilterStringFilterNoNulls/1048576/2   21.518 GiB/sec   21.028 GiB/sec 
   -2.281   {'iterations': 15479, 'data null%': 0.0, 'mask null%': 0.0, 
'select%': 1.0}
   12FilterStringFilterNoNulls/1048576/142.367 GiB/sec2.295 GiB/sec 
   -3.071   {'iterations': 1690, 'data null%': 90.0, 'mask null%': 0.0, 
'select%': 1.0}
   27   FilterStringFilterWithNulls/1048576/8   14.006 GiB/sec   12.802 GiB/sec 
   -8.601   

[GitHub] [arrow] wesm opened a new pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-28 Thread GitBox


wesm opened a new pull request #7569:
URL: https://github.com/apache/arrow/pull/7569


   Improve performance with streamlined implementation with bulk appends for 
the non-null/all-selected case. Benchmarks to follow



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-28 Thread GitBox


wesm commented on pull request #7560:
URL: https://github.com/apache/arrow/pull/7560#issuecomment-650789278


   Sure, that sounds good



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-28 Thread GitBox


emkornfield commented on pull request #7560:
URL: https://github.com/apache/arrow/pull/7560#issuecomment-650789140


   No, IIRC I think I hacked integration tests to generate smaller files then 
looped over them with one of the binaries.  I can try to come up with something 
to update the json files and submit a PR to the test data repo?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7568:
URL: https://github.com/apache/arrow/pull/7568#issuecomment-650783849


   https://issues.apache.org/jira/browse/ARROW-9241



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-28 Thread GitBox


wesm opened a new pull request #7568:
URL: https://github.com/apache/arrow/pull/7568


   Also updates the generated Flatbuffers bindings. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7567: ARROW-9259: [Format][DONOTMERGE] Add language indicating that unsigned dictionary indices are supported but that signed integers a

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7567:
URL: https://github.com/apache/arrow/pull/7567#issuecomment-650781105


   https://issues.apache.org/jira/browse/ARROW-9259



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7567: ARROW-9259: [Format][DONOTMERGE] Add language indicating that unsigned dictionary indices are supported but that signed integers are preferred

2020-06-28 Thread GitBox


wesm opened a new pull request #7567:
URL: https://github.com/apache/arrow/pull/7567


   This does not alter the format metadata in any way but has implications for 
the reference implementations (e.g. C++ currently rejects unsigned integer 
indices). 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7566: ARROW-9258: [FORMAT][DONOTMERGE] Add V5 MetadataVersion to Schema.fbs

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7566:
URL: https://github.com/apache/arrow/pull/7566#issuecomment-650778344


   https://issues.apache.org/jira/browse/ARROW-9258



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7566: [FORMAT][DONOTMERGE] Add V5 MetadataVersion to Schema.fbs

2020-06-28 Thread GitBox


wesm opened a new pull request #7566:
URL: https://github.com/apache/arrow/pull/7566


   Per mailing list discussion



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-28 Thread GitBox


wesm commented on pull request #7560:
URL: https://github.com/apache/arrow/pull/7560#issuecomment-650774989


   @emkornfield looks like almost none of the 0.14.1 "gold" files can be read 
anymore because of the change to represent int64 values as strings. We should 
prepare a set of gold files from after the int64 changes but before any patches 
that alter the MetadataVersion etc. Do you have a script that you used to 
create the gold files? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations

2020-06-28 Thread GitBox


wesm closed pull request #7558:
URL: https://github.com/apache/arrow/pull/7558


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations

2020-06-28 Thread GitBox


wesm commented on pull request #7558:
URL: https://github.com/apache/arrow/pull/7558#issuecomment-650771972


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7562: ARROW-7273: [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parquet schema conversion

2020-06-28 Thread GitBox


wesm closed pull request #7562:
URL: https://github.com/apache/arrow/pull/7562


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7562: ARROW-7273: [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parquet schema conve

2020-06-28 Thread GitBox


wesm commented on pull request #7562:
URL: https://github.com/apache/arrow/pull/7562#issuecomment-650771512


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7563: ARROW-8888: [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable

2020-06-28 Thread GitBox


wesm closed pull request #7563:
URL: https://github.com/apache/arrow/pull/7563


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7563: ARROW-8888: [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable

2020-06-28 Thread GitBox


wesm commented on pull request #7563:
URL: https://github.com/apache/arrow/pull/7563#issuecomment-650771306


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-28 Thread GitBox


liyafan82 commented on pull request #7347:
URL: https://github.com/apache/arrow/pull/7347#issuecomment-650739545


   Mostly looks good to me. There are a few minor issues. 
   Since it involves some fundamental classes, could you please make sure our 
integration tests pass? @rymurr 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-28 Thread GitBox


liyafan82 commented on a change in pull request #7347:
URL: https://github.com/apache/arrow/pull/7347#discussion_r446639368



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java
##
@@ -95,4 +143,26 @@ public static long getByteBufferAddress(ByteBuffer buf) {
 
   private MemoryUtil() {
   }
+
+  /**
+   * Create nio byte buffer.
+   */
+  public static ByteBuffer directBuffer(long address, int capacity) {
+if (DIRECT_BUFFER_CONSTRUCTOR != null) {
+  if (capacity < 0) {
+throw new IllegalArgumentException("Capacity is negative, has to be 
positive or 0");
+  }
+  try {
+return (ByteBuffer) DIRECT_BUFFER_CONSTRUCTOR.newInstance(address, 
capacity);
+  } catch (Throwable cause) {
+// Not expected to ever throw!
+if (cause instanceof Error) {
+  throw (Error) cause;

Review comment:
   Is there a special reason for the cast here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-28 Thread GitBox


liyafan82 commented on a change in pull request #7347:
URL: https://github.com/apache/arrow/pull/7347#discussion_r446626385



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java
##
@@ -78,6 +77,55 @@ public Object run() {
   Field addressField = java.nio.Buffer.class.getDeclaredField("address");
   addressField.setAccessible(true);
   BYTE_BUFFER_ADDRESS_OFFSET = UNSAFE.objectFieldOffset(addressField);
+
+  Constructor directBufferConstructor;
+  long address = -1;
+  final ByteBuffer direct = ByteBuffer.allocateDirect(1);
+  try {
+
+final Object maybeDirectBufferConstructor =
+AccessController.doPrivileged(new PrivilegedAction() {
+  @Override
+  public Object run() {
+try {
+  final Constructor constructor =
+  direct.getClass().getDeclaredConstructor(long.class, 
int.class);
+  constructor.setAccessible(true);
+  logger.debug("Constructor for direct buffer found and made 
accessible");
+  return constructor;
+} catch (NoSuchMethodException e) {
+  logger.debug("Cannot get constructor for direct buffer 
allocation", e);
+  return e;
+} catch (SecurityException e) {
+  logger.debug("Cannot get constructor for direct buffer 
allocation", e);
+  return e;
+}
+  }
+});
+
+if (maybeDirectBufferConstructor instanceof Constructor) {
+  address = UNSAFE.allocateMemory(1);
+  // try to use the constructor now
+  try {
+((Constructor) 
maybeDirectBufferConstructor).newInstance(address, 1);
+directBufferConstructor = (Constructor) 
maybeDirectBufferConstructor;
+logger.debug("direct buffer constructor: available");
+  } catch (InstantiationException | IllegalAccessException | 
InvocationTargetException e) {
+logger.debug("unable to instantiate a direct buffer via 
constructor", e);

Review comment:
   here the log level should be warning or error?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-28 Thread GitBox


liyafan82 commented on a change in pull request #7347:
URL: https://github.com/apache/arrow/pull/7347#discussion_r446617744



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java
##
@@ -17,33 +17,107 @@
 
 package org.apache.arrow.memory.rounding;
 
-import java.lang.reflect.Field;
-
-import org.apache.arrow.memory.NettyAllocationManager;
 import org.apache.arrow.memory.util.CommonUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.netty.util.internal.SystemPropertyUtil;
 
 /**
  * The default rounding policy. That is, if the requested size is within the 
chunk size,
  * the rounded size will be the next power of two. Otherwise, the rounded size
  * will be identical to the requested size.
  */
 public class DefaultRoundingPolicy implements RoundingPolicy {
-
+  private static final Logger logger = 
LoggerFactory.getLogger(DefaultRoundingPolicy.class);
   public final long chunkSize;
 
   /**
-   * The singleton instance.
+   * The variables here and the static block calculates teh DEFAULT_CHUNK_SIZE.
+   *
+   * 
+   *   It was copied from {@link io.netty.buffer.PooledByteBufAllocator}.
+   * 
*/
-  public static final DefaultRoundingPolicy INSTANCE = new 
DefaultRoundingPolicy();
+  private static final int MIN_PAGE_SIZE = 4096;
+  private static final int MAX_CHUNK_SIZE = (int) (((long) Integer.MAX_VALUE + 
1) / 2);
+  private static final long DEFAULT_CHUNK_SIZE;
+  private static final int DEFAULT_PAGE_SIZE;
+  private static final int DEFAULT_MAX_ORDER;
 
-  private DefaultRoundingPolicy() {
+
+  static {
+int defaultPageSize = 
SystemPropertyUtil.getInt("org.apache.memory.allocator.pageSize", 8192);
+Throwable pageSizeFallbackCause = null;
 try {
-  Field field = 
NettyAllocationManager.class.getDeclaredField("CHUNK_SIZE");
-  field.setAccessible(true);
-  chunkSize = (Long) field.get(null);
-} catch (Exception e) {
-  throw new RuntimeException("Failed to get chunk size from allocation 
manager");
+  validateAndCalculatePageShifts(defaultPageSize);
+} catch (Throwable t) {
+  pageSizeFallbackCause = t;
+  defaultPageSize = 8192;
+}
+DEFAULT_PAGE_SIZE = defaultPageSize;
+
+int defaultMaxOrder = 
SystemPropertyUtil.getInt("org.apache.memory.allocator.maxOrder", 11);
+Throwable maxOrderFallbackCause = null;
+try {
+  validateAndCalculateChunkSize(DEFAULT_PAGE_SIZE, defaultMaxOrder);
+} catch (Throwable t) {
+  maxOrderFallbackCause = t;
+  defaultMaxOrder = 11;
+}
+DEFAULT_MAX_ORDER = defaultMaxOrder;
+DEFAULT_CHUNK_SIZE = validateAndCalculateChunkSize(DEFAULT_PAGE_SIZE, 
DEFAULT_MAX_ORDER);
+if (logger.isDebugEnabled()) {
+  if (pageSizeFallbackCause == null) {
+logger.debug("-Dorg.apache.memory.allocator.pageSize: {}", 
DEFAULT_PAGE_SIZE);
+  } else {
+logger.debug("-Dorg.apache.memory.allocator.pageSize: {}", 
DEFAULT_PAGE_SIZE, pageSizeFallbackCause);
+  }
+  if (maxOrderFallbackCause == null) {
+logger.debug("-Dorg.apache.memory.allocator.maxOrder: {}", 
DEFAULT_MAX_ORDER);
+  } else {
+logger.debug("-Dorg.apache.memory.allocator.maxOrder: {}", 
DEFAULT_MAX_ORDER, maxOrderFallbackCause);
+  }

Review comment:
   Please remove "DEFAULT_MAX_ORDER" here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-28 Thread GitBox


liyafan82 commented on a change in pull request #7347:
URL: https://github.com/apache/arrow/pull/7347#discussion_r446617676



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java
##
@@ -17,33 +17,107 @@
 
 package org.apache.arrow.memory.rounding;
 
-import java.lang.reflect.Field;
-
-import org.apache.arrow.memory.NettyAllocationManager;
 import org.apache.arrow.memory.util.CommonUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.netty.util.internal.SystemPropertyUtil;
 
 /**
  * The default rounding policy. That is, if the requested size is within the 
chunk size,
  * the rounded size will be the next power of two. Otherwise, the rounded size
  * will be identical to the requested size.
  */
 public class DefaultRoundingPolicy implements RoundingPolicy {
-
+  private static final Logger logger = 
LoggerFactory.getLogger(DefaultRoundingPolicy.class);
   public final long chunkSize;
 
   /**
-   * The singleton instance.
+   * The variables here and the static block calculates teh DEFAULT_CHUNK_SIZE.
+   *
+   * 
+   *   It was copied from {@link io.netty.buffer.PooledByteBufAllocator}.
+   * 
*/
-  public static final DefaultRoundingPolicy INSTANCE = new 
DefaultRoundingPolicy();
+  private static final int MIN_PAGE_SIZE = 4096;
+  private static final int MAX_CHUNK_SIZE = (int) (((long) Integer.MAX_VALUE + 
1) / 2);
+  private static final long DEFAULT_CHUNK_SIZE;
+  private static final int DEFAULT_PAGE_SIZE;
+  private static final int DEFAULT_MAX_ORDER;
 
-  private DefaultRoundingPolicy() {
+
+  static {
+int defaultPageSize = 
SystemPropertyUtil.getInt("org.apache.memory.allocator.pageSize", 8192);
+Throwable pageSizeFallbackCause = null;
 try {
-  Field field = 
NettyAllocationManager.class.getDeclaredField("CHUNK_SIZE");
-  field.setAccessible(true);
-  chunkSize = (Long) field.get(null);
-} catch (Exception e) {
-  throw new RuntimeException("Failed to get chunk size from allocation 
manager");
+  validateAndCalculatePageShifts(defaultPageSize);
+} catch (Throwable t) {
+  pageSizeFallbackCause = t;
+  defaultPageSize = 8192;
+}
+DEFAULT_PAGE_SIZE = defaultPageSize;
+
+int defaultMaxOrder = 
SystemPropertyUtil.getInt("org.apache.memory.allocator.maxOrder", 11);
+Throwable maxOrderFallbackCause = null;
+try {
+  validateAndCalculateChunkSize(DEFAULT_PAGE_SIZE, defaultMaxOrder);
+} catch (Throwable t) {
+  maxOrderFallbackCause = t;
+  defaultMaxOrder = 11;
+}
+DEFAULT_MAX_ORDER = defaultMaxOrder;
+DEFAULT_CHUNK_SIZE = validateAndCalculateChunkSize(DEFAULT_PAGE_SIZE, 
DEFAULT_MAX_ORDER);
+if (logger.isDebugEnabled()) {
+  if (pageSizeFallbackCause == null) {
+logger.debug("-Dorg.apache.memory.allocator.pageSize: {}", 
DEFAULT_PAGE_SIZE);
+  } else {
+logger.debug("-Dorg.apache.memory.allocator.pageSize: {}", 
DEFAULT_PAGE_SIZE, pageSizeFallbackCause);

Review comment:
   please remove "DEFAULT_PAGE_SIZE" here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-28 Thread GitBox


kou closed pull request #7565:
URL: https://github.com/apache/arrow/pull/7565


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-06-28 Thread GitBox


liyafan82 commented on pull request #7544:
URL: https://github.com/apache/arrow/pull/7544#issuecomment-650709228


   @pitrou Thanks a lot for your attention. 
   I think it is ready for review now. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Ktakuya332C commented on a change in pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-28 Thread GitBox


Ktakuya332C commented on a change in pull request #7565:
URL: https://github.com/apache/arrow/pull/7565#discussion_r446610914



##
File path: cpp/CMakeLists.txt
##
@@ -472,7 +472,7 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARROW_CXXFLAGS}")
 
 # For any C code, use the same flags. These flags don't contain
 # C++ specific flags.
-set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARROW_CXX_FLAGS} ${CXX_COMMON_FLAGS}")
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARROW_CXXFLAGS} ${CXX_COMMON_FLAGS}")

Review comment:
   Thank you for your comment! Changed the order in 6e7aaa3.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-28 Thread GitBox


kou commented on a change in pull request #7565:
URL: https://github.com/apache/arrow/pull/7565#discussion_r446609892



##
File path: cpp/CMakeLists.txt
##
@@ -472,7 +472,7 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARROW_CXXFLAGS}")
 
 # For any C code, use the same flags. These flags don't contain
 # C++ specific flags.
-set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARROW_CXX_FLAGS} ${CXX_COMMON_FLAGS}")
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARROW_CXXFLAGS} ${CXX_COMMON_FLAGS}")

Review comment:
   Good catch!
   Could you use `${CXX_COMMON_FLAGS} ${ARROW_CXXFLAGS}` order to allow 
overriding `${CXX_COMMON_FLAGS}` by `${ARROW_CXXFLAGS}`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-28 Thread GitBox


github-actions[bot] commented on pull request #7565:
URL: https://github.com/apache/arrow/pull/7565#issuecomment-650706362


   https://issues.apache.org/jira/browse/ARROW-9256



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Ktakuya332C opened a new pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-28 Thread GitBox


Ktakuya332C opened a new pull request #7565:
URL: https://github.com/apache/arrow/pull/7565


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org