[GitHub] [arrow] emkornfield commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r453152953



##
File path: dev/archery/archery/integration/runner.py
##
@@ -126,6 +126,8 @@ def _gold_tests(self, gold_dir):
 if f.name == name).skip
 except StopIteration:
 skip = set()
+if name == 'union' and prefix == '0.17.1':

Review comment:
   IMO I think this is OK for now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-10 Thread GitBox


emkornfield commented on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-656984423


   > There is no way to deselect specific tests, afaik. You would have select 
all the other tests individually from that module, for example
   
   My preference, if it is ok with everyone reviewing this would be to avoid 
this and try to get something merged into Spark master quickly (I can own 
looking at nightly spark jobs until that is done to verify no more failures).
   
   @BryanCutler if you don't have the bandwidth to make the PR to spark, I can 
devote a little bit of effort to submitting one.
   
   Thoughts?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jglaser commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9

2020-07-10 Thread GitBox


jglaser commented on pull request #7711:
URL: https://github.com/apache/arrow/pull/7711#issuecomment-656983089


   should we leave this open until the upstream PR is merged, and make this a 
PR for the updated version?
   https://github.com/Cyan4973/xxHash/pull/426
   
   @emkornfield Please advise. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453150926



##
File path: cpp/src/parquet/arrow/schema.h
##
@@ -163,24 +165,28 @@ struct PARQUET_EXPORT SchemaManifest {
 return it->second;
   }
 
-  bool GetFieldIndices(const std::vector& column_indices, 
std::vector* out) {
+  ::arrow::Result> GetFieldIndices(
+  const std::vector& column_indices) {
 // Coalesce a list of schema field indices which are the roots of the
 // columns referred to by a list of column indices
 const schema::GroupNode* group = descr->group_node();
 std::unordered_set already_added;
-out->clear();
-for (auto& column_idx : column_indices) {
+
+std::vector out;
+for (int column_idx : column_indices) {
+  if (column_idx < 0 || column_idx >= descr->num_columns()) {
+return ::arrow::Status::IndexError("Column index ", column_idx, " is 
not valid");
+  }
   auto field_node = descr->GetColumnRoot(column_idx);
   auto field_idx = group->FieldIndex(*field_node);
   if (field_idx < 0) {

Review comment:
   not your code, but is this really checking for equality to -1?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453150835



##
File path: cpp/src/parquet/arrow/schema.h
##
@@ -163,24 +165,28 @@ struct PARQUET_EXPORT SchemaManifest {
 return it->second;
   }
 
-  bool GetFieldIndices(const std::vector& column_indices, 
std::vector* out) {
+  ::arrow::Result> GetFieldIndices(
+  const std::vector& column_indices) {
 // Coalesce a list of schema field indices which are the roots of the

Review comment:
   could we move the comment to a docstring and give an example?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jglaser commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9

2020-07-10 Thread GitBox


jglaser commented on pull request #7711:
URL: https://github.com/apache/arrow/pull/7711#issuecomment-656981267


   oh, understood. I will submit to xxhash, then. Will take me some time to 
sort this out, though. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9

2020-07-10 Thread GitBox


emkornfield commented on pull request #7711:
URL: https://github.com/apache/arrow/pull/7711#issuecomment-656980608


   @jglaser thank you for the PR.  Since this is a vendored library it would 
likely be better to make the PR upstream and then upgrade here once that is 
merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453150104



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -819,10 +812,7 @@ Status FileReaderImpl::ReadRowGroups(const 
std::vector& row_groups,
 
   // We only need to read schema fields which have columns indicated
   // in the indices vector
-  std::vector field_indices;
-  if (!manifest_.GetFieldIndices(indices, _indices)) {
-return Status::Invalid("Invalid column index");
-  }
+  ARROW_ASSIGN_OR_RAISE(auto field_indices, 
manifest_.GetFieldIndices(indices));

Review comment:
   please spell out type.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453150052



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -781,6 +770,9 @@ Status FileReaderImpl::GetRecordBatchReader(const 
std::vector& row_group_in
 const std::vector& 
column_indices,
 
std::unique_ptr* out) {
   // column indices check
+  ARROW_ASSIGN_OR_RAISE(std::ignore, 
manifest_.GetFieldIndices(column_indices));

Review comment:
   it would be nice if RETURN_NOT_OK could be used here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149894



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader {
 
 class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader {
  public:
-  RowGroupRecordBatchReader(std::vector> 
field_readers,
-std::shared_ptr<::arrow::Schema> schema, int64_t 
batch_size)
-  : field_readers_(std::move(field_readers)),
-schema_(std::move(schema)),
-batch_size_(batch_size) {}
-
   ~RowGroupRecordBatchReader() override {}
 
   std::shared_ptr<::arrow::Schema> schema() const override { return schema_; }
 
-  static Status Make(const std::vector& row_groups,
- const std::vector& column_indices, FileReaderImpl* 
reader,
- int64_t batch_size,
- std::unique_ptr<::arrow::RecordBatchReader>* out) {
-std::vector field_indices;
-if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) {
-  return Status::Invalid("Invalid column index");
-}
+  static ::arrow::Result> Make(
+  const std::vector& row_group_indices, const std::vector& 
column_indices,
+  int64_t batch_size, FileReaderImpl* reader) {
+std::unique_ptr out(new 
RowGroupRecordBatchReader);
 
-std::vector> 
field_readers(field_indices.size());
-std::vector> fields;
+RETURN_NOT_OK(FromParquetSchema(
+reader->parquet_reader()->metadata()->schema(), 
default_arrow_reader_properties(),
+/*key_value_metadata=*/nullptr, column_indices, >schema_));
 
-auto included_leaves = VectorToSharedSet(column_indices);
-for (size_t i = 0; i < field_indices.size(); ++i) {
-  RETURN_NOT_OK(reader->GetFieldReader(field_indices[i], included_leaves, 
row_groups,
-   _readers[i]));
-  fields.push_back(field_readers[i]->field());
-}
-out->reset(new RowGroupRecordBatchReader(std::move(field_readers),
- ::arrow::schema(fields), 
batch_size));
-return Status::OK();
+using ::arrow::RecordBatchIterator;
+
+auto row_group_index_to_batch_iterator =
+[=](const int* i) -> ::arrow::Result {

Review comment:
   If it is intentional (it looks like this escapes the scope) I would 
prefer to be explicit about the values being captured.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149789



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader {
 
 class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader {
  public:
-  RowGroupRecordBatchReader(std::vector> 
field_readers,
-std::shared_ptr<::arrow::Schema> schema, int64_t 
batch_size)
-  : field_readers_(std::move(field_readers)),
-schema_(std::move(schema)),
-batch_size_(batch_size) {}
-
   ~RowGroupRecordBatchReader() override {}
 
   std::shared_ptr<::arrow::Schema> schema() const override { return schema_; }
 
-  static Status Make(const std::vector& row_groups,
- const std::vector& column_indices, FileReaderImpl* 
reader,
- int64_t batch_size,
- std::unique_ptr<::arrow::RecordBatchReader>* out) {
-std::vector field_indices;
-if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) {
-  return Status::Invalid("Invalid column index");
-}
+  static ::arrow::Result> Make(
+  const std::vector& row_group_indices, const std::vector& 
column_indices,
+  int64_t batch_size, FileReaderImpl* reader) {
+std::unique_ptr out(new 
RowGroupRecordBatchReader);
 
-std::vector> 
field_readers(field_indices.size());
-std::vector> fields;
+RETURN_NOT_OK(FromParquetSchema(
+reader->parquet_reader()->metadata()->schema(), 
default_arrow_reader_properties(),
+/*key_value_metadata=*/nullptr, column_indices, >schema_));
 
-auto included_leaves = VectorToSharedSet(column_indices);
-for (size_t i = 0; i < field_indices.size(); ++i) {
-  RETURN_NOT_OK(reader->GetFieldReader(field_indices[i], included_leaves, 
row_groups,
-   _readers[i]));
-  fields.push_back(field_readers[i]->field());
-}
-out->reset(new RowGroupRecordBatchReader(std::move(field_readers),
- ::arrow::schema(fields), 
batch_size));
-return Status::OK();
+using ::arrow::RecordBatchIterator;
+
+auto row_group_index_to_batch_iterator =
+[=](const int* i) -> ::arrow::Result {

Review comment:
   this binding everything by value here intentional?  If so could you add 
a comment why?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149711



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader {
 
 class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader {
  public:
-  RowGroupRecordBatchReader(std::vector> 
field_readers,
-std::shared_ptr<::arrow::Schema> schema, int64_t 
batch_size)
-  : field_readers_(std::move(field_readers)),
-schema_(std::move(schema)),
-batch_size_(batch_size) {}
-
   ~RowGroupRecordBatchReader() override {}
 
   std::shared_ptr<::arrow::Schema> schema() const override { return schema_; }
 
-  static Status Make(const std::vector& row_groups,
- const std::vector& column_indices, FileReaderImpl* 
reader,
- int64_t batch_size,
- std::unique_ptr<::arrow::RecordBatchReader>* out) {
-std::vector field_indices;
-if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) {
-  return Status::Invalid("Invalid column index");
-}
+  static ::arrow::Result> Make(
+  const std::vector& row_group_indices, const std::vector& 
column_indices,
+  int64_t batch_size, FileReaderImpl* reader) {
+std::unique_ptr out(new 
RowGroupRecordBatchReader);
 
-std::vector> 
field_readers(field_indices.size());
-std::vector> fields;
+RETURN_NOT_OK(FromParquetSchema(
+reader->parquet_reader()->metadata()->schema(), 
default_arrow_reader_properties(),

Review comment:
   its probably not great that we don't propagate reader properties here?  
I'm not sure I see if this was a problem in the old code as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149597



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader {
 
 class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader {
  public:
-  RowGroupRecordBatchReader(std::vector> 
field_readers,
-std::shared_ptr<::arrow::Schema> schema, int64_t 
batch_size)
-  : field_readers_(std::move(field_readers)),
-schema_(std::move(schema)),
-batch_size_(batch_size) {}
-
   ~RowGroupRecordBatchReader() override {}
 
   std::shared_ptr<::arrow::Schema> schema() const override { return schema_; }
 
-  static Status Make(const std::vector& row_groups,
- const std::vector& column_indices, FileReaderImpl* 
reader,
- int64_t batch_size,
- std::unique_ptr<::arrow::RecordBatchReader>* out) {
-std::vector field_indices;
-if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) {
-  return Status::Invalid("Invalid column index");
-}
+  static ::arrow::Result> Make(
+  const std::vector& row_group_indices, const std::vector& 
column_indices,
+  int64_t batch_size, FileReaderImpl* reader) {
+std::unique_ptr out(new 
RowGroupRecordBatchReader);

Review comment:
   *rant* it annoys me that that make_unique can't access private 
constructors.
   nit: please construct using empty parens at the end.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149446



##
File path: cpp/src/parquet/arrow/reader.cc
##
@@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader {
 
 class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader {
  public:
-  RowGroupRecordBatchReader(std::vector> 
field_readers,
-std::shared_ptr<::arrow::Schema> schema, int64_t 
batch_size)
-  : field_readers_(std::move(field_readers)),
-schema_(std::move(schema)),
-batch_size_(batch_size) {}
-
   ~RowGroupRecordBatchReader() override {}
 
   std::shared_ptr<::arrow::Schema> schema() const override { return schema_; }
 
-  static Status Make(const std::vector& row_groups,
- const std::vector& column_indices, FileReaderImpl* 
reader,
- int64_t batch_size,
- std::unique_ptr<::arrow::RecordBatchReader>* out) {
-std::vector field_indices;
-if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) {
-  return Status::Invalid("Invalid column index");
-}
+  static ::arrow::Result> Make(
+  const std::vector& row_group_indices, const std::vector& 
column_indices,

Review comment:
   not present before but adding some docs here might be useful.  
especially around lifecyle of reader.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149320



##
File path: cpp/src/parquet/arrow/arrow_schema_test.cc
##
@@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, 
ParquetRepeatedNestedSchema) {
   ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
 }
 
+TEST_F(TestConvertParquetSchema, ColumnSubselection) {
+  std::vector parquet_fields;
+  std::vector> arrow_fields;
+  {
+//   optional int32 leaf1;
+//   repeated group outerGroup {
+// optional int32 leaf2;
+// repeated group innerGroup {
+//   optional int32 leaf3;
+// }
+//   }
+parquet_fields.push_back(
+PrimitiveNode::Make("leaf1", Repetition::OPTIONAL, 
ParquetType::INT32));
+parquet_fields.push_back(GroupNode::Make(
+"outerGroup", Repetition::REPEATED,
+{PrimitiveNode::Make("leaf2", Repetition::OPTIONAL, 
ParquetType::INT32),
+ GroupNode::Make(
+ "innerGroup", Repetition::REPEATED,
+ {PrimitiveNode::Make("leaf3", Repetition::OPTIONAL, 
ParquetType::INT32)})}));
+
+auto inner_group_fields = {::arrow::field("leaf3", INT32, true)};
+auto inner_group_type = ::arrow::struct_(inner_group_fields);
+auto outer_group_fields = {
+::arrow::field("leaf2", INT32, true),
+::arrow::field(
+"innerGroup",
+::arrow::list(::arrow::field("innerGroup", inner_group_type, 
false)), false)};
+auto outer_group_type = ::arrow::struct_(outer_group_fields);
+
+arrow_fields.push_back(::arrow::field("leaf1", INT32, true));
+arrow_fields.push_back(::arrow::field(
+"outerGroup",
+::arrow::list(::arrow::field("outerGroup", outer_group_type, false)), 
false));
+  }
+  std::shared_ptr<::arrow::Schema> arrow_schema;
+  std::vector column_indices;
+
+  column_indices = {};
+  ASSERT_OK(
+  ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, 
_indices));
+  arrow_schema = ::arrow::schema({});
+  ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
+
+  column_indices = {0, 1, 2};
+  ASSERT_OK(
+  ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, 
_indices));
+  arrow_schema = ::arrow::schema(arrow_fields);
+  ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
+
+  column_indices = {0};
+  ASSERT_OK(
+  ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, 
_indices));
+  arrow_schema = ::arrow::schema({arrow_fields[0]});
+  ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
+
+  column_indices = {1};
+  ASSERT_OK(
+  ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, 
_indices));
+  arrow_schema = ::arrow::schema({arrow_fields[1]});
+  ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
+
+  column_indices = {2};

Review comment:
   does it pay to test at least one pair?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7711:
URL: https://github.com/apache/arrow/pull/7711#issuecomment-656978794


   https://issues.apache.org/jira/browse/ARROW-9415



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149152



##
File path: cpp/src/parquet/arrow/arrow_schema_test.cc
##
@@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, 
ParquetRepeatedNestedSchema) {
   ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
 }
 
+TEST_F(TestConvertParquetSchema, ColumnSubselection) {
+  std::vector parquet_fields;
+  std::vector> arrow_fields;
+  {
+//   optional int32 leaf1;
+//   repeated group outerGroup {
+// optional int32 leaf2;
+// repeated group innerGroup {
+//   optional int32 leaf3;
+// }
+//   }
+parquet_fields.push_back(
+PrimitiveNode::Make("leaf1", Repetition::OPTIONAL, 
ParquetType::INT32));
+parquet_fields.push_back(GroupNode::Make(
+"outerGroup", Repetition::REPEATED,
+{PrimitiveNode::Make("leaf2", Repetition::OPTIONAL, 
ParquetType::INT32),
+ GroupNode::Make(
+ "innerGroup", Repetition::REPEATED,
+ {PrimitiveNode::Make("leaf3", Repetition::OPTIONAL, 
ParquetType::INT32)})}));
+
+auto inner_group_fields = {::arrow::field("leaf3", INT32, true)};

Review comment:
   P.S. we really need a JSON utility to construct a schema from json., it 
would make this much easier to read.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453149053



##
File path: cpp/src/parquet/arrow/arrow_schema_test.cc
##
@@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, 
ParquetRepeatedNestedSchema) {
   ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
 }
 
+TEST_F(TestConvertParquetSchema, ColumnSubselection) {
+  std::vector parquet_fields;
+  std::vector> arrow_fields;
+  {
+//   optional int32 leaf1;
+//   repeated group outerGroup {
+// optional int32 leaf2;
+// repeated group innerGroup {
+//   optional int32 leaf3;
+// }
+//   }
+parquet_fields.push_back(
+PrimitiveNode::Make("leaf1", Repetition::OPTIONAL, 
ParquetType::INT32));
+parquet_fields.push_back(GroupNode::Make(
+"outerGroup", Repetition::REPEATED,
+{PrimitiveNode::Make("leaf2", Repetition::OPTIONAL, 
ParquetType::INT32),
+ GroupNode::Make(
+ "innerGroup", Repetition::REPEATED,
+ {PrimitiveNode::Make("leaf3", Repetition::OPTIONAL, 
ParquetType::INT32)})}));
+
+auto inner_group_fields = {::arrow::field("leaf3", INT32, true)};

Review comment:
   please spell out types in some form where you are using bracket 
initializations.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453148949



##
File path: cpp/src/parquet/arrow/arrow_schema_test.cc
##
@@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, 
ParquetRepeatedNestedSchema) {
   ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema));
 }
 
+TEST_F(TestConvertParquetSchema, ColumnSubselection) {
+  std::vector parquet_fields;
+  std::vector> arrow_fields;
+  {
+//   optional int32 leaf1;
+//   repeated group outerGroup {
+// optional int32 leaf2;
+// repeated group innerGroup {
+//   optional int32 leaf3;
+// }
+//   }
+parquet_fields.push_back(

Review comment:
   are these intentionally not using logical annotations (i.e. LIST?).  
Maybe ad a comment if so.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453148787



##
File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc
##
@@ -2877,11 +2877,22 @@ TEST(TestArrowReaderAdHoc, 
LARGE_MEMORY_TEST(LargeStringColumn)) {
   array.reset();
 
   auto reader = 
ParquetFileReader::Open(std::make_shared(tables_buffer));
-
   std::unique_ptr arrow_reader;
   ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), 
_reader));
   ASSERT_OK_NO_THROW(arrow_reader->ReadTable());
   ASSERT_OK(table->ValidateFull());
+
+  // ARROW-9297: ensure RecordBatchReader also works
+  reader = 
ParquetFileReader::Open(std::make_shared(tables_buffer));
+  ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), 
_reader));
+  std::shared_ptr<::arrow::RecordBatchReader> batch_reader;
+  auto all_row_groups = 
::arrow::internal::Iota(reader->metadata()->num_row_groups());
+  ASSERT_OK_NO_THROW(arrow_reader->GetRecordBatchReader(all_row_groups, 
_reader));
+  ASSERT_OK_AND_ASSIGN(auto batched_table,

Review comment:
   nm.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453148755



##
File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc
##
@@ -2877,11 +2877,22 @@ TEST(TestArrowReaderAdHoc, 
LARGE_MEMORY_TEST(LargeStringColumn)) {
   array.reset();
 
   auto reader = 
ParquetFileReader::Open(std::make_shared(tables_buffer));
-
   std::unique_ptr arrow_reader;
   ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), 
_reader));
   ASSERT_OK_NO_THROW(arrow_reader->ReadTable());
   ASSERT_OK(table->ValidateFull());
+
+  // ARROW-9297: ensure RecordBatchReader also works
+  reader = 
ParquetFileReader::Open(std::make_shared(tables_buffer));
+  ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), 
_reader));
+  std::shared_ptr<::arrow::RecordBatchReader> batch_reader;
+  auto all_row_groups = 
::arrow::internal::Iota(reader->metadata()->num_row_groups());
+  ASSERT_OK_NO_THROW(arrow_reader->GetRecordBatchReader(all_row_groups, 
_reader));
+  ASSERT_OK_AND_ASSIGN(auto batched_table,

Review comment:
   type here I think is still useful (i.e. not auto).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

2020-07-10 Thread GitBox


emkornfield commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453148683



##
File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc
##
@@ -2877,11 +2877,22 @@ TEST(TestArrowReaderAdHoc, 
LARGE_MEMORY_TEST(LargeStringColumn)) {
   array.reset();
 
   auto reader = 
ParquetFileReader::Open(std::make_shared(tables_buffer));
-
   std::unique_ptr arrow_reader;
   ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), 
_reader));
   ASSERT_OK_NO_THROW(arrow_reader->ReadTable());
   ASSERT_OK(table->ValidateFull());
+
+  // ARROW-9297: ensure RecordBatchReader also works
+  reader = 
ParquetFileReader::Open(std::make_shared(tables_buffer));
+  ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), 
_reader));
+  std::shared_ptr<::arrow::RecordBatchReader> batch_reader;
+  auto all_row_groups = 
::arrow::internal::Iota(reader->metadata()->num_row_groups());

Review comment:
   Could this be simplified to:
   `::arrow::internal::Iota all_row_groups(...);
   
   if so please do it.  If not please spell out the type instead of auto.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7711: fix P9 build

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7711:
URL: https://github.com/apache/arrow/pull/7711#issuecomment-656976907


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jglaser opened a new pull request #7711: fix P9 build

2020-07-10 Thread GitBox


jglaser opened a new pull request #7711:
URL: https://github.com/apache/arrow/pull/7711


   Fix build on Power9
   
   This small change addresses the following build error:
   ```[ 20%] Building CXX object 
src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_nested.cc.o
   In file included from 
/ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/array/dict_internal.h:35:0,
from 
/ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/array/builder_dict.cc:22:
   /ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/util/hashing.h: In static 
member function 'static uint32_t arrow::internal::SmallScalarTraits<__vector(4) 
__bool int>::AsIndex(__vector(4) __bool int)':
   /ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/util/hashing.h:495:60: 
error: cannot convert '__vector(4) int' to 'uint32_t {aka unsigned int}' in 
return
  static uint32_t AsIndex(bool value) { return value ? 1 : 0; }
   ^
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-07-10 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r453139681



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
*/
   private void splitAndTransferValidityBuffer(int startIndex, int length,
   BaseVariableWidthVector target) {
-int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
-int byteSizeTarget = getValidityBufferSizeFromCount(length);
-int offset = startIndex % 8;
+if (length <= 0) {

Review comment:
   Maybe for clarity, I could avoid doing these changes and simply 
[fix](https://github.com/apache/arrow/pull/6402/files#diff-db6c4f9e4030c5da8ccbcfe93a41b8a0R775)
 the validity buffer transfer. Would that be better ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

2020-07-10 Thread GitBox


lidavidm commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-656950069


   If that's 
https://github.com/apache/arrow/blob/master/ci/scripts/integration_spark.sh 
then I'll take a look when I get a chance, thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #7685: ARROW-9362: [Java] Support reading/writing V4 MetadataVersion

2020-07-10 Thread GitBox


lidavidm commented on pull request #7685:
URL: https://github.com/apache/arrow/pull/7685#issuecomment-656949942


   > So is this ok to merge before #7290 ?
   
   This should be OK. It doesn't enable unions in the integration tests and 
prevents reading/writing unions with the old version. (Of course, someone could 
build master and start writing files with V5 metadata and the wrong union 
layout?)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

2020-07-10 Thread GitBox


BryanCutler commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-656949654


   Thanks for doing this @lidavidm ! I believe this will be necessary for a 
PySpark v2.4.x or v3.0.0 user that upgrades to PyArrow 1.0.0. It would be good 
to setup the spark integration test to verify this too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-10 Thread GitBox


houqp commented on pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#issuecomment-656943172


   +1 on unifying to Arc for the initial release given it's not on the hot 
path. It's better to start simple and optimize later. Arc's atomic reference 
counting tend to mess with compiler optimization and CPU cacheline, so it 
definitely has extra runtime overhead. But it shouldn't matter for schema. If 
get schema turned out to be on the hot path, then we have a bigger problem to 
fix :P I can't think of any reason why that should be the case.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7685: ARROW-9362: [Java] Support reading/writing V4 MetadataVersion

2020-07-10 Thread GitBox


BryanCutler commented on pull request #7685:
URL: https://github.com/apache/arrow/pull/7685#issuecomment-656940054


   So is this ok to merge before #7290 ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-10 Thread GitBox


nealrichardson closed pull request #7660:
URL: https://github.com/apache/arrow/pull/7660


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7710: ARROW-9413: [Rust] Disable cpm_nan clippy error

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7710:
URL: https://github.com/apache/arrow/pull/7710#issuecomment-656926979


   https://issues.apache.org/jira/browse/ARROW-9413



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7710: ARROW-9413: [Rust] Disable cpm_nan clippy error

2020-07-10 Thread GitBox


nevi-me commented on pull request #7710:
URL: https://github.com/apache/arrow/pull/7710#issuecomment-656924262


   CC @houqp 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


nealrichardson closed pull request #7694:
URL: https://github.com/apache/arrow/pull/7694


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #7710: ARROW-9413: [Rust] Disable cpm_nan clippy error

2020-07-10 Thread GitBox


nevi-me opened a new pull request #7710:
URL: https://github.com/apache/arrow/pull/7710


   Using the comparison recommended by clippy makes sorts with `NAN` 
undeterministic.
   We currently sort NAN separately to nulls, we couldcan resolve this 
separately



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-10 Thread GitBox


wesm commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-656921774


   I googled a little bit about the compilation errors and it seems to be a gcc 
bug. I think that some of these kernels are going to have to be rewritten to 
use inline lambdas instead of CRTP. There should be no performance difference. 
I'll take a closer look at it this weekend



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656921476


   Also cc @xhochy. IDK how serious this is; I personally have never used this 
conda setup, and what would make more sense would be to test the actual conda 
recipe



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again

2020-07-10 Thread GitBox


nealrichardson closed pull request #7707:
URL: https://github.com/apache/arrow/pull/7707


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only c

2020-07-10 Thread GitBox


nevi-me commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656917911


   > @nevi-me When I was testing it we weren't seeing it while reading the 
record batch back directly; we had to complete the serialization to a byte 
stream and then read it back.
   
   Yes, writing to a file stream, and reading that back, works. Try the below:
   
   ```rust
   #[test]
   fn test_arrow_single_float_row() {
   let schema = Schema::new(vec![
   Field::new("a", DataType::Float32, false),
   Field::new("b", DataType::Float32, false),
   Field::new("c", DataType::Int32, false),
   Field::new("d", DataType::Int32, false),
   ]);
   let arrays = vec![
   Arc::new(Float32Array::from(vec![1.23])) as ArrayRef,
   Arc::new(Float32Array::from(vec![-6.50])) as ArrayRef,
   Arc::new(Int32Array::from(vec![2])) as ArrayRef,
   Arc::new(Int32Array::from(vec![1])) as ArrayRef,
   ];
   let batch = RecordBatch::try_new(Arc::new(schema.clone()), 
arrays).unwrap();
   // create stream writer
   let file = 
File::create("target/debug/testdata/float.stream").unwrap();
   let mut stream_writer = StreamWriter::try_new(file, 
).unwrap();
   stream_writer.write().unwrap();
   stream_writer.finish().unwrap();
   
   // read stream back
   let file = File::open("target/debug/testdata/float.stream").unwrap();
   let mut reader = StreamReader::try_new(file).unwrap();
   
   let read_batch = reader.next_batch().unwrap().unwrap();
   // TODO: test the output 
   }
   ```
   
   I was able to reproduce the 0.0 values without your fix, and the correct 
numbers with the fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656914568


   Thanks. I just opened https://issues.apache.org/jira/browse/ARROW-9412



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta

2020-07-10 Thread GitBox


tobim commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656912930


   I'm fine with that. I'll try to find some time for it tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that static lin

2020-07-10 Thread GitBox


wesm closed pull request #7696:
URL: https://github.com/apache/arrow/pull/7696


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656910317


   Revision: a28dcda15e57f95cbad1007adce6b70e5af03eb9
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-406](https://github.com/ursa-labs/crossbow/branches/all?query=actions-406)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-406-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-406-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656909712







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656909818


   +1. Let us address the `ARROW_STATIC_INSTALL_INTERFACE_LIBS` issue and 
associated testing as a follow up PR. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maxburke commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only

2020-07-10 Thread GitBox


maxburke commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656908743


   @nevi-me When I was testing it we weren't seeing it while reading the record 
batch back directly; we had to complete the serialization to a byte stream and 
then read it back.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656907552


   > This is not true.
   
   Sorry, my presumption was that when a system library is installed there is a 
shared library available. If there is both a shared and static library 
available for a dependency, the build system is _supposed_ to select the shared 
one. But you are right if there is only a static library (e.g. if someone 
created an all-static "system" toolchain), then indeed static libraries would 
be selected. But they would not be added to `libarrow_bundled_dependencies.a`, 
and they aren't. 
   
   > That's how it should be then. But by removing 
ARROW_STATIC_INSTALL_INTERFACE_LIBS you are breaking the case of external 
static dependencies. If you would only add those libs that are not BUNDLED to 
the list I'd be happy.
   
   Well, static linking was not possible at all prior to this PR when using the 
recommended memory allocator (jemalloc). So nothing has been "broken" because 
it never worked before. The case of external static dependencies isn't tested 
at all -- in no CI or integration task. If this use case is important to you, 
we could use your help writing Dockerized integration tests that exhibit this 
scenario so that we aren't flying blind like we are now. 
   
   I agree with adding non-bundled static libraries to 
`ARROW_STATIC_INSTALL_INTERFACE_LIBS` but we have to write tests to demonstrate 
that it works and that it does not break the use cases represented in this PR. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta

2020-07-10 Thread GitBox


tobim commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656902745


   > our build system does not select static libraries when using 
-DARROW_DEPENDENCY_SOURCE=SYSTEM
   
   This is not true.
   
   > only add static libraries to `libarrow_bundled_dependencies.a` that the 
Arrow build system built itself.
   
   That's how it should be then. But by removing 
`ARROW_STATIC_INSTALL_INTERFACE_LIBS` you are breaking the case of external 
static dependencies. If you would only add those libs that are not `BUNDLED` to 
the list I'd be happy.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me closed pull request #7193: ARROW-7924: [Rust] Add sort for float types

2020-07-10 Thread GitBox


nevi-me closed pull request #7193:
URL: https://github.com/apache/arrow/pull/7193


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels

2020-07-10 Thread GitBox


wesm commented on a change in pull request #7668:
URL: https://github.com/apache/arrow/pull/7668#discussion_r453084856



##
File path: r/R/array.R
##
@@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) {
   nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like
   if (inherits(i, "array_expression")) {
 # Evaluate it
+# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is 
resolved
+# i <- eval_array_expression(i)

Review comment:
   Oh right. 
   
   I'll add metafunctions for calling "match" and "isin"? That's pretty easy, 
I'm not sure how else to resolve it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] sunchao commented on pull request #7176: ARROW-8796: [Rust] feat: Allow writers to use Vec

2020-07-10 Thread GitBox


sunchao commented on pull request #7176:
URL: https://github.com/apache/arrow/pull/7176#issuecomment-656898683


   Just took another look. If we are going with this, we can potentially make 
the API simpler. For instance, current it is:
   ```rust
   fn next_row_group( self) -> Result>;
   ```
   As the returned box already is tracked by non-static lifetime, we can 
perhaps change the API to:
   ```rust
  fn next_row_group( self) -> Result< RowGroupWriter>;
   ```
   and track the current group writer within the file writer. With this we can 
also potentially remove `close_row_group`. Similar things can be done for 
column writer. We can do this in a separate PR though. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656898878


   Revision: b455526af7db3d2849238dbf20fb2643396b6224
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-405](https://github.com/ursa-labs/crossbow/branches/all?query=actions-405)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-405-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-405-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656898194


   @github-actions crossbow submit *as-cran*



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656897869


   Ok I've worked around the decor issue but now there's what appears to be a 
legit failure, a segfault on the new dictionary integration test: 
https://github.com/ursa-labs/crossbow/runs/859461851?check_suite_focus=true#step:6:4357
 cc @romainfrancois 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels

2020-07-10 Thread GitBox


nealrichardson commented on a change in pull request #7668:
URL: https://github.com/apache/arrow/pull/7668#discussion_r453082760



##
File path: r/R/array.R
##
@@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) {
   nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like
   if (inherits(i, "array_expression")) {
 # Evaluate it
+# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is 
resolved
+# i <- eval_array_expression(i)

Review comment:
   Yes, I know, I started working on this and ran into ARROW-9389. I can 
hack around this here for now, but I was leaning towards holding this until 
after 1.0 anyway. What do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels

2020-07-10 Thread GitBox


wesm commented on a change in pull request #7668:
URL: https://github.com/apache/arrow/pull/7668#discussion_r453081425



##
File path: r/R/array.R
##
@@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) {
   nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like
   if (inherits(i, "array_expression")) {
 # Evaluate it
+# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is 
resolved
+# i <- eval_array_expression(i)

Review comment:
   When uncommenting this I get this failure
   
   ```
   ══ Failed 
══
   ── 1. Error: filter() with %in% (@test-dplyr.R#149)  
───
   'match' requires vector arguments
   Backtrace:
 1. arrow:::expect_dplyr_equal(...) testthat/test-dplyr.R:149:2
 2. dplyr::filter(., dbl > 2, chr %in% c("d", "f"))
10. dplyr::collect(.)
14. arrow:::`[.RecordBatch`(...)
15. arrow:::filter_rows(x, i, ...)
16. arrow:::eval_array_expression(i)
17. base::lapply(...)
18. arrow:::FUN(X[[i]], ...)
19. arrow:::eval_array_expression(a)
21.  %in% c("d", "f")
   
   ══ DONE 

   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7709: ARROW-9411: [Rust] Update dependencies

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7709:
URL: https://github.com/apache/arrow/pull/7709#issuecomment-656894889


   https://issues.apache.org/jira/browse/ARROW-9411



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels

2020-07-10 Thread GitBox


wesm commented on a change in pull request #7668:
URL: https://github.com/apache/arrow/pull/7668#discussion_r453079515



##
File path: r/R/array.R
##
@@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) {
   nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like
   if (inherits(i, "array_expression")) {
 # Evaluate it
+# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is 
resolved
+# i <- eval_array_expression(i)

Review comment:
   I'm doing this now and checking the tests





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static libra

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656892388


   Revision: f43ce4606996ed44980dba514c824786e70729d3
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-404](https://github.com/ursa-labs/crossbow/branches/all?query=actions-404)
   
   |Task|Status|
   ||--|
   |centos-6-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-centos-6-amd64)|
   
|centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |centos-7-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-centos-7-amd64)|
   
|centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |centos-8-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-centos-8-amd64)|
   |debian-buster-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-debian-buster-amd64)|
   
|debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |debian-stretch-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-debian-stretch-amd64)|
   
|debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-bionic-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-bionic-amd64)|
   
|ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-eoan-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-eoan-amd64)|
   
|ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-focal-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-focal-amd64)|
   
|ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-xenial-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-xenial-amd64)|
   
|ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #7709: ARROW-9411: [Rust] Update dependencies

2020-07-10 Thread GitBox


nevi-me opened a new pull request #7709:
URL: https://github.com/apache/arrow/pull/7709


   This updates some dependencies like prost and rand.
   Helps reduce number of dependencies



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656891774


   Fingers crossed that I've fixed the gRPC bundling problem and the Linux 
packages work this time



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656891639


   @github-actions crossbow submit -g linux



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so th

2020-07-10 Thread GitBox


wesm edited a comment on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656890077


   > I still think that "-DARROW_DEPENDENCY_SOURCE=SYSTEM" should work with 
arrow-static.
   > For this case my opinion is that non-vendored dependencies should not be 
bundled. Normally I want use the SYSTEM libraries for dependencies. As a 
developer of a third-party I do not want arrow to add symbols it does not own 
to my static linking process.
   
   @tobim our build system does not select static libraries when using 
`-DARROW_DEPENDENCY_SOURCE=SYSTEM` so they would not be bundled in 
`libarrow_bundled_dependencies.a`. That's kind of the whole point of this PR -- 
only add static libraries to `libarrow_bundled_dependencies.a` that the Arrow 
build system built itself. 
   
   If `arrow_static` has shared library dependencies on system libraries, then 
some other approach will be required to inform CMake automatically when doing 
`find_package(Arrow)`. It's definitely out of the scope for this PR. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656890077


   > I still think that "-DARROW_DEPENDENCY_SOURCE=SYSTEM" should work with 
arrow-static.
   > For this case my opinion is that non-vendored dependencies should not be 
bundled. Normally I want use the SYSTEM libraries for dependencies. As a 
developer of a third-party I do not want arrow to add symbols it does not own 
to my static linking process.
   
   @tobim our build system does not select static libraries when using 
`-DARROW_DEPENDENCY_SOURCE=SYSTEM` so they would not be bundled in 
`libarrow_bundled_dependencies.a`. That's kind of the whole point of this PR -- 
only add static libraries to `libarrow_bundled_dependencies.a` that the Arrow 
build system built itself. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] sunchao commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-10 Thread GitBox


sunchao commented on pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#issuecomment-656889046


   > The Rc version is used in a few places but again I don't think it's that 
important to use Rc over Arc.
   
   Agreed. I don't see any performance implication for this, as `schema()` 
shouldn't appear on hot paths.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jeroen commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


jeroen commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656888136


   Not from the configure step I think.
   
   These are all `make` variables so the native way to build embedded libs is 
to pass these down when invoking a sub-make, like so: 
https://github.com/rstudio/httpuv/blob/master/src/Makevars#L65-L70
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta

2020-07-10 Thread GitBox


tobim commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656887788


   > Re: system dependencies and static build, is 
https://issues.apache.org/jira/browse/ARROW-6312 related? I encountered this 
issue in https://issues.apache.org/jira/browse/ARROW-9303, and isn't this is a 
slightly different problem? 
   
   ARROW-6312 is the `pkg-config` variant of the approach I suggested for CMake.
   
   > Because your system dependencies could be shared libraries that you 
couldn't glue together anyway, right?
   
   Luckily I have control over linking flavor of the libraries on my system 
search path.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656884857


   Is there a way to get that config without shelling out at all?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jeroen commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


jeroen commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656883822


   You have a bug here because your version of `R_CMD_config` takes R from the 
PATH but that may not be the correct R. Some machines have many versions of R.
   
   It may be safer to use e.g. `tools::Rcmd(c('config', 'CXX'), stdout=T)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656882522


   Re: system dependencies and static build, is 
https://issues.apache.org/jira/browse/ARROW-6312 related? I encountered this 
issue in https://issues.apache.org/jira/browse/ARROW-9303, and isn't this is a 
slightly different problem? Because your system dependencies could be shared 
libraries that you couldn't glue together anyway, right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7708: ARROW-9292: [Doc] Remove Rust from feature matrix

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7708:
URL: https://github.com/apache/arrow/pull/7708#issuecomment-656882517


   https://issues.apache.org/jira/browse/ARROW-9292



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7707:
URL: https://github.com/apache/arrow/pull/7707#issuecomment-656882518


   https://issues.apache.org/jira/browse/ARROW-9410



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #7708: ARROW-9292: [Doc] Remove Rust from feature matrix

2020-07-10 Thread GitBox


nevi-me opened a new pull request #7708:
URL: https://github.com/apache/arrow/pull/7708


   We currently have problems with Rust integration testing, so I'm temporarily 
removing them from the documentation



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656881900


   Revision: 002a43e4dd7504c9911d458d8f05f6aecf1a564f
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-403](https://github.com/ursa-labs/crossbow/branches/all?query=actions-403)
   
   |Task|Status|
   ||--|
   |test-conda-r-4.0|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-403-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-403-github-test-conda-r-4.0)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta

2020-07-10 Thread GitBox


tobim commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656880866


   > We can't put the Arrow toolchain libraries in INTERFACE_LINK_LIBRARIES 
because they will be unrecognized by the third party project.
   
   I still think that `"-DARROW_DEPENDENCY_SOURCE=SYSTEM"` should work with 
`arrow-static`.
   
   For this case my opinion is that non-vendored dependencies should not be 
bundled. Normally I want use the SYSTEM libraries for dependencies. As a 
developer of a third-party I do not want arrow to add symbols it does not own 
to my static linking process.
   
   May I suggest a different solution for that aspect of the problem: CMake 
provides a [module for declaring 
dependencies](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html)
 in the templated _Project_Config.cmake. That allows you to insert your own 
dependencies to the target graph of the downstream project.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656881146


   @github-actions crossbow submit *conda-r*



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7707:
URL: https://github.com/apache/arrow/pull/7707#issuecomment-656880263


   Revision: 04f80607d70cd4a7c892512a14a76e145de50486
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-402](https://github.com/ursa-labs/crossbow/branches/all?query=actions-402)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-402-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7707:
URL: https://github.com/apache/arrow/pull/7707#issuecomment-656879577


   @github-actions crossbow submit homebrew-cpp



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again

2020-07-10 Thread GitBox


nealrichardson opened a new pull request #7707:
URL: https://github.com/apache/arrow/pull/7707


   Pulls some changes to the formula from upstream



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656876659


   https://issues.apache.org/jira/browse/ARROW-9409



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656873001


   Revision: 1647387c4eaabe5ab13515be04d88abab7c9e137
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-401](https://github.com/ursa-labs/crossbow/branches/all?query=actions-401)
   
   |Task|Status|
   ||--|
   |test-conda-r-4.0|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-401-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-401-github-test-conda-r-4.0)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7706:
URL: https://github.com/apache/arrow/pull/7706#issuecomment-656872355


   @github-actions crossbow submit *conda-r*



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails

2020-07-10 Thread GitBox


nealrichardson opened a new pull request #7706:
URL: https://github.com/apache/arrow/pull/7706


   It's annoying that we need to declare these separately



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7705: ARROW-9408: [Integration] Fix Windows numpy datagen issues

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7705:
URL: https://github.com/apache/arrow/pull/7705#issuecomment-656871192


   https://issues.apache.org/jira/browse/ARROW-9408



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7695: ARROW-8989: [C++][Doc] Document available compute functions

2020-07-10 Thread GitBox


bkietz commented on a change in pull request #7695:
URL: https://github.com/apache/arrow/pull/7695#discussion_r453046448



##
File path: docs/source/cpp/compute.rst
##
@@ -0,0 +1,419 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. default-domain:: cpp
+.. highlight:: cpp
+.. cpp:namespace:: arrow::compute
+
+=
+Compute Functions
+=
+
+.. TODO: describe API and how to invoke compute functions
+
+Available functions
+===
+
+Aggregations
+
+
++--+++---++
+| Function name| Arity  | Input types| Output type 
  | Options class  |
++==+++===++
+| count| Unary  | Any| Scalar Int64
  | :struct:`CountOptions` |
++--+++---++
+| mean | Unary  | Numeric| Scalar Float64  
  ||
++--+++---++
+| minmax   | Unary  | Numeric| Scalar Struct  
(1)| :struct:`MinMaxOptions`|
++--+++---++
+| sum  | Unary  | Numeric| Scalar Numeric 
(2)||
++--+++---++
+
+Notes:
+
+* \(1) Output is a ``{"min": input type, "max": input type}`` Struct
+
+* \(2) Output is Int64, UInt64 or Float64, depending on the input type
+
+
+Element-wise ("scalar") functions
+-
+
+Arithmetic functions
+
+
+Those functions expect two inputs of the same type and apply a given binary
+operation to each pair of elements gathered from the inputs.  Each function
+is also available in an overflow-checking variant, suffixed ``_checked``.
+
+If any of the input elements in a pair is null, the corresponding output
+element is null.
+
++--+++-+
+| Function name| Arity  | Input types| Output type 
|
++==+++=+
+| add  | Binary | Numeric| Numeric 
|
++--+++-+
+| add_checked  | Binary | Numeric| Numeric 
|
++--+++-+
+| multiply | Binary | Numeric| Numeric 
|
++--+++-+
+| multiply_checked | Binary | Numeric| Numeric 
|
++--+++-+
+| subtract | Binary | Numeric| Numeric 
|
++--+++-+
+| subtract_checked | Binary | Numeric| Numeric 
|
++--+++-+
+
+Comparisons
+~~~
+
+Those functions expect two inputs of the same type and apply a given
+comparison operator.  If any of the input elements in a pair is null,
+the corresponding output element is null.
+
++--++-+-+
+| Function names   

[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656869168


   The one test failure is spurious
   
   ```
   pyarrow/tests/test_fs.py:1285: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ 
   pyarrow/_fs.pyx:439: in pyarrow._fs.FileSystem.get_file_info
   infos = GetResultValue(self.fs.GetFileInfo(selector))
   pyarrow/error.pxi:122: in pyarrow.lib.pyarrow_internal_check_status
   return check_status(status)
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ 
   >   raise IOError(message)
   E   OSError: When listing objects under key '' in bucket 
'ursa-labs-taxi-data': AWS Error [code 99]: Unable to connect to endpoint
   E   In ../src/arrow/filesystem/s3fs.cc, line 1098, code: 
ListObjectsV2(bucket, key, std::move(handle_results), std::move(handle_error))
   E   In ../src/arrow/filesystem/s3fs.cc, line 1358, code: impl_->Walk(select, 
base_path.bucket, base_path.key, )
   ```
   
   I verified that the minimal build still works on Linux (presumably it's 
therefore fine on macOS) and Windows/MSVC



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #7705: ARROW-9408: [Integration] Fix Windows numpy datagen issues

2020-07-10 Thread GitBox


nevi-me opened a new pull request #7705:
URL: https://github.com/apache/arrow/pull/7705


   We found that the integer range check when generating integration data 
doesn't work in Windows because the default C integers that numpy uses are 
32-bit by default in Windows.
   
   This fixes that issue by forcing 64-bit integers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] eerhardt commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-10 Thread GitBox


eerhardt commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r453045241



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)
 ValueOffsets = new ArrowBuffer.Builder();
 ValueBuffer = new ArrowBuffer.Builder();
 ValidityBuffer = new ArrowBuffer.BitmapBuilder();
+
+// From the docs:
+//
+// The offsets buffer contains length + 1 signed integers 
(either 32-bit or 64-bit, depending on the
+// logical type), which encode the start position of each slot 
in the data buffer. The length of the
+// value in each slot is computed using the difference between 
the offset at that slot’s index and the
+// subsequent offset.
+//
+// In this builder, we choose to append the first offset 
(zero) upon construction, and each trailing
+// offset is then added after each individual item has been 
appended.
+ValueOffsets.Append(this.Offset);
 }
 
 protected abstract TArray Build(ArrayData data);
 
-public int Length => ValueOffsets.Length;
+/// 
+/// Gets the length of the array built so far.
+/// 
+public int Length => ValueOffsets.Length - 1;
 
+/// 
+/// Build an Arrow array from the appended contents so far.
+/// 
+/// Optional memory allocator.
+/// Returns an array of type .
 public TArray Build(MemoryAllocator allocator = default)
 {
-ValueOffsets.Append(Offset);
-
-ArrowBuffer validityBuffer = NullCount > 0
-? ValidityBuffer.Build(allocator)
-: ArrowBuffer.Empty;
-
-var data = new ArrayData(DataType, ValueOffsets.Length - 1, 
NullCount, 0,
-new[] { validityBuffer, ValueOffsets.Build(allocator), 
ValueBuffer.Build(allocator) });
+var bufs = new[]
+{
+NullCount > 0 ? ValidityBuffer.Build(allocator) : 
ArrowBuffer.Empty,
+ValueOffsets.Build(allocator),
+ValueBuffer.Build(allocator),
+};
+var data = new ArrayData(
+DataType,
+length: ValueOffsets.Length - 1,
+NullCount,
+offset: 0,
+bufs);
 
 return Build(data);
 }
 
+/// 
+/// Append a single null value to the array.
+/// 
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder AppendNull()
 {
-ValueOffsets.Append(Offset);
+// Do not add to the value buffer in the case of a null.
+// Note that we do not need to increment the offset as a 
result.
 ValidityBuffer.Append(false);
+ValueOffsets.Append(Offset);
 return Instance;
 }
 
+/// 
+/// Appends a value, consisting of a single byte, to the array.
+/// 
+/// Byte value to append.
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder Append(byte value)
 {
-ValueOffsets.Append(Offset);
 ValueBuffer.Append(value);
-Offset++;
 ValidityBuffer.Append(true);
+Offset++;
+ValueOffsets.Append(Offset);
 return Instance;
 }
 
+/// 
+/// Append a value, consisting of a span of bytes, to the array.
+/// 
+/// 
+/// Note that a single value is added, which consists of 
arbitrarily many bytes.  If multiple values are
+/// to be added, use the  method.
+/// 
+/// Span of bytes to add.
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder Append(ReadOnlySpan span)
 {
-ValueOffsets.Append(Offset);
 ValueBuffer.Append(span);
 ValidityBuffer.Append(true);
 Offset += span.Length;
+ValueOffsets.Append(Offset);
 return Instance;
 }
 
-public TBuilder AppendRange(IEnumerable values)
+/// 
+/// Append a value, consisting of an enumerable collection of 
bytes, to the array.
+/// 
+/// 
+/// Note that this method appends a single value, which may 
consist of arbitrarily many bytes.  If multiple
+  

[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only c

2020-07-10 Thread GitBox


nevi-me commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656859641


   Hey @maxburke, I think I understand the gist of the problem, but do you need 
to create the `ParquetBufWriter` struct to test this? Wouldn't writing a single 
record batch with the arrays to a stream, then reading it back, work? Or am I 
oversimplifying the problem?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static libra

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656858894


   Revision: 1f24b9e60587f1b42ee0a8d903e7cc9fc051
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-400](https://github.com/ursa-labs/crossbow/branches/all?query=actions-400)
   
   |Task|Status|
   ||--|
   |centos-6-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-centos-6-amd64)|
   
|centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |centos-7-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-centos-7-amd64)|
   
|centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |centos-8-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-centos-8-amd64)|
   |debian-buster-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-debian-buster-amd64)|
   
|debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |debian-stretch-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-debian-stretch-amd64)|
   
|debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-bionic-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-bionic-amd64)|
   
|ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-eoan-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-eoan-amd64)|
   
|ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-focal-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-focal-amd64)|
   
|ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |ubuntu-xenial-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-xenial-amd64)|
   
|ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-10 Thread GitBox


wesm commented on pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#issuecomment-656858004







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-10 Thread GitBox


paddyhoran commented on a change in pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#discussion_r453044078



##
File path: rust/datafusion/src/execution/physical_plan/common.rs
##
@@ -47,12 +47,12 @@ impl RecordBatchIterator {
 }
 }
 
-impl BatchIterator for RecordBatchIterator {
+impl SendableRecordBatchReader for RecordBatchIterator {
 fn schema() -> Arc {

Review comment:
   I'm +0 on this.  I personally like seeing the explicit type signature 
(once it's not very verbose) but I have no problem with this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library

2020-07-10 Thread GitBox


wesm commented on a change in pull request #7696:
URL: https://github.com/apache/arrow/pull/7696#discussion_r453043649



##
File path: cpp/examples/minimal_build/CMakeLists.txt
##
@@ -19,10 +19,31 @@ cmake_minimum_required(VERSION 3.0)
 
 project(ArrowMinimalExample)
 
+option(ARROW_LINK_SHARED "Link to the Arrow shared library" ON)
+
 find_package(Arrow REQUIRED)
 
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_BUILD_TYPE Release)
+
 message(STATUS "Arrow version: ${ARROW_VERSION}")
 message(STATUS "Arrow SO version: ${ARROW_FULL_SO_VERSION}")
 
 add_executable(arrow_example example.cc)
-target_link_libraries(arrow_example PRIVATE arrow_shared)
+
+if (ARROW_LINK_SHARED)
+  target_link_libraries(arrow_example PRIVATE arrow_shared)
+else()
+  set(THREADS_PREFER_PTHREAD_FLAG ON)
+  find_package(Threads REQUIRED)
+
+  # We must also link to libarrow_bundled_dependencies.a
+  get_property(arrow_static_loc TARGET arrow_static PROPERTY LOCATION)
+  get_filename_component(arrow_lib_dir ${arrow_static_loc} DIRECTORY)
+
+  add_library(arrow_dependencies STATIC IMPORTED)
+  set_target_properties(arrow_dependencies
+PROPERTIES IMPORTED_LOCATION
+
"${arrow_lib_dir}/${CMAKE_STATIC_LIBRARY_PREFIX}arrow_bundled_dependencies${CMAKE_STATIC_LIBRARY_SUFFIX}")

Review comment:
   I got this working, awesome! I'll update the docs





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-10 Thread GitBox


paddyhoran commented on pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#issuecomment-656856701


   > I think the `schema` method is already returning a `Arc`, where is 
the `Rc` being used? IMHO if it is not too difficult to do we should 
consolidate on a single trait now instead of having to change it later again.
   
   It did occur to me that we could just have a single trait that uses `Arc` 
and not bother with the `Rc` version.  I'm not sure what the performance of 
`Arc` is versus `Rc`.  Obviously, there has to be some difference but is this 
important in the context of Arrow?  I'm up for converting to one trait and 
using `Arc`.  If users request it then we can add the `Rc` version later.  What 
do others think?
   
   The `Rc` version is used in a few places but again I don't think it's that 
important to use `Rc` over `Arc`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] eerhardt commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-10 Thread GitBox


eerhardt commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r453042194



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -237,7 +329,9 @@ public ReadOnlySpan GetBytes(int index)
 
 if (IsNull(index))
 {
-return null;
+// Note that `return null;` is valid syntax, but would be 
misleading as `null` in the context of a span
+// is actually returned as an empty span.
+return ReadOnlySpan.Empty;

Review comment:
   I've been confused and bit by this a few times in different projects. 
All the following ways produce an "equivalent" span:
   
   * `return null`  (this uses the implicit operator from an array)
   * `return default`
   * `return ReadOnlySpan.Empty`  (this just says `return default` 
underneath the covers)
   
   I agree that `ReadOnlySpan.Empty` is the most clear thing to return here. 
Next would be `return default`. `return null` is the worst of the three options 
IMO.

##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)
 ValueOffsets = new ArrowBuffer.Builder();
 ValueBuffer = new ArrowBuffer.Builder();
 ValidityBuffer = new ArrowBuffer.BitmapBuilder();
+
+// From the docs:
+//
+// The offsets buffer contains length + 1 signed integers 
(either 32-bit or 64-bit, depending on the
+// logical type), which encode the start position of each slot 
in the data buffer. The length of the
+// value in each slot is computed using the difference between 
the offset at that slot’s index and the
+// subsequent offset.
+//
+// In this builder, we choose to append the first offset 
(zero) upon construction, and each trailing
+// offset is then added after each individual item has been 
appended.
+ValueOffsets.Append(this.Offset);
 }
 
 protected abstract TArray Build(ArrayData data);
 
-public int Length => ValueOffsets.Length;
+/// 
+/// Gets the length of the array built so far.
+/// 
+public int Length => ValueOffsets.Length - 1;
 
+/// 
+/// Build an Arrow array from the appended contents so far.
+/// 
+/// Optional memory allocator.
+/// Returns an array of type .
 public TArray Build(MemoryAllocator allocator = default)
 {
-ValueOffsets.Append(Offset);
-
-ArrowBuffer validityBuffer = NullCount > 0
-? ValidityBuffer.Build(allocator)
-: ArrowBuffer.Empty;
-
-var data = new ArrayData(DataType, ValueOffsets.Length - 1, 
NullCount, 0,
-new[] { validityBuffer, ValueOffsets.Build(allocator), 
ValueBuffer.Build(allocator) });
+var bufs = new[]
+{
+NullCount > 0 ? ValidityBuffer.Build(allocator) : 
ArrowBuffer.Empty,
+ValueOffsets.Build(allocator),
+ValueBuffer.Build(allocator),
+};
+var data = new ArrayData(
+DataType,
+length: ValueOffsets.Length - 1,

Review comment:
   Can this line just be `Length` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] eerhardt commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-10 Thread GitBox


eerhardt commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r453038089



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)
 ValueOffsets = new ArrowBuffer.Builder();
 ValueBuffer = new ArrowBuffer.Builder();
 ValidityBuffer = new ArrowBuffer.BitmapBuilder();
+
+// From the docs:
+//
+// The offsets buffer contains length + 1 signed integers 
(either 32-bit or 64-bit, depending on the
+// logical type), which encode the start position of each slot 
in the data buffer. The length of the
+// value in each slot is computed using the difference between 
the offset at that slot’s index and the
+// subsequent offset.
+//
+// In this builder, we choose to append the first offset 
(zero) upon construction, and each trailing
+// offset is then added after each individual item has been 
appended.
+ValueOffsets.Append(this.Offset);

Review comment:
   Nice fix.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7703: ARROW-9355 [R]: Fix -Wimplicit-int-float-conversion

2020-07-10 Thread GitBox


nealrichardson closed pull request #7703:
URL: https://github.com/apache/arrow/pull/7703


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build

2020-07-10 Thread GitBox


github-actions[bot] commented on pull request #7694:
URL: https://github.com/apache/arrow/pull/7694#issuecomment-656837525


   https://issues.apache.org/jira/browse/ARROW-9397



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7703: ARROW-9355 [R]: Fix -Wimplicit-int-float-conversion

2020-07-10 Thread GitBox


nealrichardson commented on pull request #7703:
URL: https://github.com/apache/arrow/pull/7703#issuecomment-656832748


   @github-actions autotune everything



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] eerhardt commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-10 Thread GitBox


eerhardt commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-656832270


   Can we just introduce new `Append(DateTime value)` and 
`GetDateAsDateTime(int index)` APIs instead of making a breaking change?
   
   In general, DateTimeOffset is the preferred type to use when talking about 
dates and times. The reasoning is because exactly what you point out in the 
JIRA issue - DateTime.Kind is very confusing and can cause a lot of problems. 
So we generally recommend users to use DateTimeOffset unless they have a strong 
reason to use DateTime.
   
   So I'd suggest to keep the existing code (the builder is based on 
DateTimeOffset), but to also add new `Append(DateTime value)` overloads to 
solve the issue you logged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >