[GitHub] [arrow] emkornfield commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working
emkornfield commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r453152953 ## File path: dev/archery/archery/integration/runner.py ## @@ -126,6 +126,8 @@ def _gold_tests(self, gold_dir): if f.name == name).skip except StopIteration: skip = set() +if name == 'union' and prefix == '0.17.1': Review comment: IMO I think this is OK for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
emkornfield commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-656984423 > There is no way to deselect specific tests, afaik. You would have select all the other tests individually from that module, for example My preference, if it is ok with everyone reviewing this would be to avoid this and try to get something merged into Spark master quickly (I can own looking at nightly spark jobs until that is done to verify no more failures). @BryanCutler if you don't have the bandwidth to make the PR to spark, I can devote a little bit of effort to submitting one. Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jglaser commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9
jglaser commented on pull request #7711: URL: https://github.com/apache/arrow/pull/7711#issuecomment-656983089 should we leave this open until the upstream PR is merged, and make this a PR for the updated version? https://github.com/Cyan4973/xxHash/pull/426 @emkornfield Please advise. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453150926 ## File path: cpp/src/parquet/arrow/schema.h ## @@ -163,24 +165,28 @@ struct PARQUET_EXPORT SchemaManifest { return it->second; } - bool GetFieldIndices(const std::vector& column_indices, std::vector* out) { + ::arrow::Result> GetFieldIndices( + const std::vector& column_indices) { // Coalesce a list of schema field indices which are the roots of the // columns referred to by a list of column indices const schema::GroupNode* group = descr->group_node(); std::unordered_set already_added; -out->clear(); -for (auto& column_idx : column_indices) { + +std::vector out; +for (int column_idx : column_indices) { + if (column_idx < 0 || column_idx >= descr->num_columns()) { +return ::arrow::Status::IndexError("Column index ", column_idx, " is not valid"); + } auto field_node = descr->GetColumnRoot(column_idx); auto field_idx = group->FieldIndex(*field_node); if (field_idx < 0) { Review comment: not your code, but is this really checking for equality to -1? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453150835 ## File path: cpp/src/parquet/arrow/schema.h ## @@ -163,24 +165,28 @@ struct PARQUET_EXPORT SchemaManifest { return it->second; } - bool GetFieldIndices(const std::vector& column_indices, std::vector* out) { + ::arrow::Result> GetFieldIndices( + const std::vector& column_indices) { // Coalesce a list of schema field indices which are the roots of the Review comment: could we move the comment to a docstring and give an example? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jglaser commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9
jglaser commented on pull request #7711: URL: https://github.com/apache/arrow/pull/7711#issuecomment-656981267 oh, understood. I will submit to xxhash, then. Will take me some time to sort this out, though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9
emkornfield commented on pull request #7711: URL: https://github.com/apache/arrow/pull/7711#issuecomment-656980608 @jglaser thank you for the PR. Since this is a vendored library it would likely be better to make the PR upstream and then upgrade here once that is merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453150104 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -819,10 +812,7 @@ Status FileReaderImpl::ReadRowGroups(const std::vector& row_groups, // We only need to read schema fields which have columns indicated // in the indices vector - std::vector field_indices; - if (!manifest_.GetFieldIndices(indices, _indices)) { -return Status::Invalid("Invalid column index"); - } + ARROW_ASSIGN_OR_RAISE(auto field_indices, manifest_.GetFieldIndices(indices)); Review comment: please spell out type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453150052 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -781,6 +770,9 @@ Status FileReaderImpl::GetRecordBatchReader(const std::vector& row_group_in const std::vector& column_indices, std::unique_ptr* out) { // column indices check + ARROW_ASSIGN_OR_RAISE(std::ignore, manifest_.GetFieldIndices(column_indices)); Review comment: it would be nice if RETURN_NOT_OK could be used here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149894 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader { class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader { public: - RowGroupRecordBatchReader(std::vector> field_readers, -std::shared_ptr<::arrow::Schema> schema, int64_t batch_size) - : field_readers_(std::move(field_readers)), -schema_(std::move(schema)), -batch_size_(batch_size) {} - ~RowGroupRecordBatchReader() override {} std::shared_ptr<::arrow::Schema> schema() const override { return schema_; } - static Status Make(const std::vector& row_groups, - const std::vector& column_indices, FileReaderImpl* reader, - int64_t batch_size, - std::unique_ptr<::arrow::RecordBatchReader>* out) { -std::vector field_indices; -if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) { - return Status::Invalid("Invalid column index"); -} + static ::arrow::Result> Make( + const std::vector& row_group_indices, const std::vector& column_indices, + int64_t batch_size, FileReaderImpl* reader) { +std::unique_ptr out(new RowGroupRecordBatchReader); -std::vector> field_readers(field_indices.size()); -std::vector> fields; +RETURN_NOT_OK(FromParquetSchema( +reader->parquet_reader()->metadata()->schema(), default_arrow_reader_properties(), +/*key_value_metadata=*/nullptr, column_indices, >schema_)); -auto included_leaves = VectorToSharedSet(column_indices); -for (size_t i = 0; i < field_indices.size(); ++i) { - RETURN_NOT_OK(reader->GetFieldReader(field_indices[i], included_leaves, row_groups, - _readers[i])); - fields.push_back(field_readers[i]->field()); -} -out->reset(new RowGroupRecordBatchReader(std::move(field_readers), - ::arrow::schema(fields), batch_size)); -return Status::OK(); +using ::arrow::RecordBatchIterator; + +auto row_group_index_to_batch_iterator = +[=](const int* i) -> ::arrow::Result { Review comment: If it is intentional (it looks like this escapes the scope) I would prefer to be explicit about the values being captured. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149789 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader { class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader { public: - RowGroupRecordBatchReader(std::vector> field_readers, -std::shared_ptr<::arrow::Schema> schema, int64_t batch_size) - : field_readers_(std::move(field_readers)), -schema_(std::move(schema)), -batch_size_(batch_size) {} - ~RowGroupRecordBatchReader() override {} std::shared_ptr<::arrow::Schema> schema() const override { return schema_; } - static Status Make(const std::vector& row_groups, - const std::vector& column_indices, FileReaderImpl* reader, - int64_t batch_size, - std::unique_ptr<::arrow::RecordBatchReader>* out) { -std::vector field_indices; -if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) { - return Status::Invalid("Invalid column index"); -} + static ::arrow::Result> Make( + const std::vector& row_group_indices, const std::vector& column_indices, + int64_t batch_size, FileReaderImpl* reader) { +std::unique_ptr out(new RowGroupRecordBatchReader); -std::vector> field_readers(field_indices.size()); -std::vector> fields; +RETURN_NOT_OK(FromParquetSchema( +reader->parquet_reader()->metadata()->schema(), default_arrow_reader_properties(), +/*key_value_metadata=*/nullptr, column_indices, >schema_)); -auto included_leaves = VectorToSharedSet(column_indices); -for (size_t i = 0; i < field_indices.size(); ++i) { - RETURN_NOT_OK(reader->GetFieldReader(field_indices[i], included_leaves, row_groups, - _readers[i])); - fields.push_back(field_readers[i]->field()); -} -out->reset(new RowGroupRecordBatchReader(std::move(field_readers), - ::arrow::schema(fields), batch_size)); -return Status::OK(); +using ::arrow::RecordBatchIterator; + +auto row_group_index_to_batch_iterator = +[=](const int* i) -> ::arrow::Result { Review comment: this binding everything by value here intentional? If so could you add a comment why? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149711 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader { class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader { public: - RowGroupRecordBatchReader(std::vector> field_readers, -std::shared_ptr<::arrow::Schema> schema, int64_t batch_size) - : field_readers_(std::move(field_readers)), -schema_(std::move(schema)), -batch_size_(batch_size) {} - ~RowGroupRecordBatchReader() override {} std::shared_ptr<::arrow::Schema> schema() const override { return schema_; } - static Status Make(const std::vector& row_groups, - const std::vector& column_indices, FileReaderImpl* reader, - int64_t batch_size, - std::unique_ptr<::arrow::RecordBatchReader>* out) { -std::vector field_indices; -if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) { - return Status::Invalid("Invalid column index"); -} + static ::arrow::Result> Make( + const std::vector& row_group_indices, const std::vector& column_indices, + int64_t batch_size, FileReaderImpl* reader) { +std::unique_ptr out(new RowGroupRecordBatchReader); -std::vector> field_readers(field_indices.size()); -std::vector> fields; +RETURN_NOT_OK(FromParquetSchema( +reader->parquet_reader()->metadata()->schema(), default_arrow_reader_properties(), Review comment: its probably not great that we don't propagate reader properties here? I'm not sure I see if this was a problem in the old code as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149597 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader { class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader { public: - RowGroupRecordBatchReader(std::vector> field_readers, -std::shared_ptr<::arrow::Schema> schema, int64_t batch_size) - : field_readers_(std::move(field_readers)), -schema_(std::move(schema)), -batch_size_(batch_size) {} - ~RowGroupRecordBatchReader() override {} std::shared_ptr<::arrow::Schema> schema() const override { return schema_; } - static Status Make(const std::vector& row_groups, - const std::vector& column_indices, FileReaderImpl* reader, - int64_t batch_size, - std::unique_ptr<::arrow::RecordBatchReader>* out) { -std::vector field_indices; -if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) { - return Status::Invalid("Invalid column index"); -} + static ::arrow::Result> Make( + const std::vector& row_group_indices, const std::vector& column_indices, + int64_t batch_size, FileReaderImpl* reader) { +std::unique_ptr out(new RowGroupRecordBatchReader); Review comment: *rant* it annoys me that that make_unique can't access private constructors. nit: please construct using empty parens at the end. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149446 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -301,62 +302,50 @@ class FileReaderImpl : public FileReader { class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader { public: - RowGroupRecordBatchReader(std::vector> field_readers, -std::shared_ptr<::arrow::Schema> schema, int64_t batch_size) - : field_readers_(std::move(field_readers)), -schema_(std::move(schema)), -batch_size_(batch_size) {} - ~RowGroupRecordBatchReader() override {} std::shared_ptr<::arrow::Schema> schema() const override { return schema_; } - static Status Make(const std::vector& row_groups, - const std::vector& column_indices, FileReaderImpl* reader, - int64_t batch_size, - std::unique_ptr<::arrow::RecordBatchReader>* out) { -std::vector field_indices; -if (!reader->manifest_.GetFieldIndices(column_indices, _indices)) { - return Status::Invalid("Invalid column index"); -} + static ::arrow::Result> Make( + const std::vector& row_group_indices, const std::vector& column_indices, Review comment: not present before but adding some docs here might be useful. especially around lifecyle of reader. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149320 ## File path: cpp/src/parquet/arrow/arrow_schema_test.cc ## @@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, ParquetRepeatedNestedSchema) { ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); } +TEST_F(TestConvertParquetSchema, ColumnSubselection) { + std::vector parquet_fields; + std::vector> arrow_fields; + { +// optional int32 leaf1; +// repeated group outerGroup { +// optional int32 leaf2; +// repeated group innerGroup { +// optional int32 leaf3; +// } +// } +parquet_fields.push_back( +PrimitiveNode::Make("leaf1", Repetition::OPTIONAL, ParquetType::INT32)); +parquet_fields.push_back(GroupNode::Make( +"outerGroup", Repetition::REPEATED, +{PrimitiveNode::Make("leaf2", Repetition::OPTIONAL, ParquetType::INT32), + GroupNode::Make( + "innerGroup", Repetition::REPEATED, + {PrimitiveNode::Make("leaf3", Repetition::OPTIONAL, ParquetType::INT32)})})); + +auto inner_group_fields = {::arrow::field("leaf3", INT32, true)}; +auto inner_group_type = ::arrow::struct_(inner_group_fields); +auto outer_group_fields = { +::arrow::field("leaf2", INT32, true), +::arrow::field( +"innerGroup", +::arrow::list(::arrow::field("innerGroup", inner_group_type, false)), false)}; +auto outer_group_type = ::arrow::struct_(outer_group_fields); + +arrow_fields.push_back(::arrow::field("leaf1", INT32, true)); +arrow_fields.push_back(::arrow::field( +"outerGroup", +::arrow::list(::arrow::field("outerGroup", outer_group_type, false)), false)); + } + std::shared_ptr<::arrow::Schema> arrow_schema; + std::vector column_indices; + + column_indices = {}; + ASSERT_OK( + ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, _indices)); + arrow_schema = ::arrow::schema({}); + ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); + + column_indices = {0, 1, 2}; + ASSERT_OK( + ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, _indices)); + arrow_schema = ::arrow::schema(arrow_fields); + ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); + + column_indices = {0}; + ASSERT_OK( + ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, _indices)); + arrow_schema = ::arrow::schema({arrow_fields[0]}); + ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); + + column_indices = {1}; + ASSERT_OK( + ConvertSchema(parquet_fields, /*key_value_metadata=*/nullptr, _indices)); + arrow_schema = ::arrow::schema({arrow_fields[1]}); + ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); + + column_indices = {2}; Review comment: does it pay to test at least one pair? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7711: ARROW-9415: [C++] Arrow does not compile on Power9
github-actions[bot] commented on pull request #7711: URL: https://github.com/apache/arrow/pull/7711#issuecomment-656978794 https://issues.apache.org/jira/browse/ARROW-9415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149152 ## File path: cpp/src/parquet/arrow/arrow_schema_test.cc ## @@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, ParquetRepeatedNestedSchema) { ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); } +TEST_F(TestConvertParquetSchema, ColumnSubselection) { + std::vector parquet_fields; + std::vector> arrow_fields; + { +// optional int32 leaf1; +// repeated group outerGroup { +// optional int32 leaf2; +// repeated group innerGroup { +// optional int32 leaf3; +// } +// } +parquet_fields.push_back( +PrimitiveNode::Make("leaf1", Repetition::OPTIONAL, ParquetType::INT32)); +parquet_fields.push_back(GroupNode::Make( +"outerGroup", Repetition::REPEATED, +{PrimitiveNode::Make("leaf2", Repetition::OPTIONAL, ParquetType::INT32), + GroupNode::Make( + "innerGroup", Repetition::REPEATED, + {PrimitiveNode::Make("leaf3", Repetition::OPTIONAL, ParquetType::INT32)})})); + +auto inner_group_fields = {::arrow::field("leaf3", INT32, true)}; Review comment: P.S. we really need a JSON utility to construct a schema from json., it would make this much easier to read. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453149053 ## File path: cpp/src/parquet/arrow/arrow_schema_test.cc ## @@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, ParquetRepeatedNestedSchema) { ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); } +TEST_F(TestConvertParquetSchema, ColumnSubselection) { + std::vector parquet_fields; + std::vector> arrow_fields; + { +// optional int32 leaf1; +// repeated group outerGroup { +// optional int32 leaf2; +// repeated group innerGroup { +// optional int32 leaf3; +// } +// } +parquet_fields.push_back( +PrimitiveNode::Make("leaf1", Repetition::OPTIONAL, ParquetType::INT32)); +parquet_fields.push_back(GroupNode::Make( +"outerGroup", Repetition::REPEATED, +{PrimitiveNode::Make("leaf2", Repetition::OPTIONAL, ParquetType::INT32), + GroupNode::Make( + "innerGroup", Repetition::REPEATED, + {PrimitiveNode::Make("leaf3", Repetition::OPTIONAL, ParquetType::INT32)})})); + +auto inner_group_fields = {::arrow::field("leaf3", INT32, true)}; Review comment: please spell out types in some form where you are using bracket initializations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453148949 ## File path: cpp/src/parquet/arrow/arrow_schema_test.cc ## @@ -644,6 +649,78 @@ TEST_F(TestConvertParquetSchema, ParquetRepeatedNestedSchema) { ASSERT_NO_FATAL_FAILURE(CheckFlatSchema(arrow_schema)); } +TEST_F(TestConvertParquetSchema, ColumnSubselection) { + std::vector parquet_fields; + std::vector> arrow_fields; + { +// optional int32 leaf1; +// repeated group outerGroup { +// optional int32 leaf2; +// repeated group innerGroup { +// optional int32 leaf3; +// } +// } +parquet_fields.push_back( Review comment: are these intentionally not using logical annotations (i.e. LIST?). Maybe ad a comment if so. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453148787 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2877,11 +2877,22 @@ TEST(TestArrowReaderAdHoc, LARGE_MEMORY_TEST(LargeStringColumn)) { array.reset(); auto reader = ParquetFileReader::Open(std::make_shared(tables_buffer)); - std::unique_ptr arrow_reader; ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), _reader)); ASSERT_OK_NO_THROW(arrow_reader->ReadTable()); ASSERT_OK(table->ValidateFull()); + + // ARROW-9297: ensure RecordBatchReader also works + reader = ParquetFileReader::Open(std::make_shared(tables_buffer)); + ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), _reader)); + std::shared_ptr<::arrow::RecordBatchReader> batch_reader; + auto all_row_groups = ::arrow::internal::Iota(reader->metadata()->num_row_groups()); + ASSERT_OK_NO_THROW(arrow_reader->GetRecordBatchReader(all_row_groups, _reader)); + ASSERT_OK_AND_ASSIGN(auto batched_table, Review comment: nm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453148755 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2877,11 +2877,22 @@ TEST(TestArrowReaderAdHoc, LARGE_MEMORY_TEST(LargeStringColumn)) { array.reset(); auto reader = ParquetFileReader::Open(std::make_shared(tables_buffer)); - std::unique_ptr arrow_reader; ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), _reader)); ASSERT_OK_NO_THROW(arrow_reader->ReadTable()); ASSERT_OK(table->ValidateFull()); + + // ARROW-9297: ensure RecordBatchReader also works + reader = ParquetFileReader::Open(std::make_shared(tables_buffer)); + ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), _reader)); + std::shared_ptr<::arrow::RecordBatchReader> batch_reader; + auto all_row_groups = ::arrow::internal::Iota(reader->metadata()->num_row_groups()); + ASSERT_OK_NO_THROW(arrow_reader->GetRecordBatchReader(all_row_groups, _reader)); + ASSERT_OK_AND_ASSIGN(auto batched_table, Review comment: type here I think is still useful (i.e. not auto). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
emkornfield commented on a change in pull request #7704: URL: https://github.com/apache/arrow/pull/7704#discussion_r453148683 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2877,11 +2877,22 @@ TEST(TestArrowReaderAdHoc, LARGE_MEMORY_TEST(LargeStringColumn)) { array.reset(); auto reader = ParquetFileReader::Open(std::make_shared(tables_buffer)); - std::unique_ptr arrow_reader; ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), _reader)); ASSERT_OK_NO_THROW(arrow_reader->ReadTable()); ASSERT_OK(table->ValidateFull()); + + // ARROW-9297: ensure RecordBatchReader also works + reader = ParquetFileReader::Open(std::make_shared(tables_buffer)); + ASSERT_OK(FileReader::Make(default_memory_pool(), std::move(reader), _reader)); + std::shared_ptr<::arrow::RecordBatchReader> batch_reader; + auto all_row_groups = ::arrow::internal::Iota(reader->metadata()->num_row_groups()); Review comment: Could this be simplified to: `::arrow::internal::Iota all_row_groups(...); if so please do it. If not please spell out the type instead of auto. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7711: fix P9 build
github-actions[bot] commented on pull request #7711: URL: https://github.com/apache/arrow/pull/7711#issuecomment-656976907 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jglaser opened a new pull request #7711: fix P9 build
jglaser opened a new pull request #7711: URL: https://github.com/apache/arrow/pull/7711 Fix build on Power9 This small change addresses the following build error: ```[ 20%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_nested.cc.o In file included from /ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/array/dict_internal.h:35:0, from /ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/array/builder_dict.cc:22: /ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/util/hashing.h: In static member function 'static uint32_t arrow::internal::SmallScalarTraits<__vector(4) __bool int>::AsIndex(__vector(4) __bool int)': /ccs/proj/stf006/glaser/rapids/arrow/cpp/src/arrow/util/hashing.h:495:60: error: cannot convert '__vector(4) int' to 'uint32_t {aka unsigned int}' in return static uint32_t AsIndex(bool value) { return value ? 1 : 0; } ^ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r453139681 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, int length, BaseVariab */ private void splitAndTransferValidityBuffer(int startIndex, int length, BaseVariableWidthVector target) { -int firstByteSource = BitVectorHelper.byteIndex(startIndex); -int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1); -int byteSizeTarget = getValidityBufferSizeFromCount(length); -int offset = startIndex % 8; +if (length <= 0) { Review comment: Maybe for clarity, I could avoid doing these changes and simply [fix](https://github.com/apache/arrow/pull/6402/files#diff-db6c4f9e4030c5da8ccbcfe93a41b8a0R775) the validity buffer transfer. Would that be better ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion
lidavidm commented on pull request #7702: URL: https://github.com/apache/arrow/pull/7702#issuecomment-656950069 If that's https://github.com/apache/arrow/blob/master/ci/scripts/integration_spark.sh then I'll take a look when I get a chance, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #7685: ARROW-9362: [Java] Support reading/writing V4 MetadataVersion
lidavidm commented on pull request #7685: URL: https://github.com/apache/arrow/pull/7685#issuecomment-656949942 > So is this ok to merge before #7290 ? This should be OK. It doesn't enable unions in the integration tests and prevents reading/writing unions with the old version. (Of course, someone could build master and start writing files with V5 metadata and the wrong union layout?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion
BryanCutler commented on pull request #7702: URL: https://github.com/apache/arrow/pull/7702#issuecomment-656949654 Thanks for doing this @lidavidm ! I believe this will be necessary for a PySpark v2.4.x or v3.0.0 user that upgrades to PyArrow 1.0.0. It would be good to setup the spark integration test to verify this too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate
houqp commented on pull request #7666: URL: https://github.com/apache/arrow/pull/7666#issuecomment-656943172 +1 on unifying to Arc for the initial release given it's not on the hot path. It's better to start simple and optimize later. Arc's atomic reference counting tend to mess with compiler optimization and CPU cacheline, so it definitely has extra runtime overhead. But it shouldn't matter for schema. If get schema turned out to be on the hot path, then we have a bigger problem to fix :P I can't think of any reason why that should be the case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on pull request #7685: ARROW-9362: [Java] Support reading/writing V4 MetadataVersion
BryanCutler commented on pull request #7685: URL: https://github.com/apache/arrow/pull/7685#issuecomment-656940054 So is this ok to merge before #7290 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types
nealrichardson closed pull request #7660: URL: https://github.com/apache/arrow/pull/7660 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7710: ARROW-9413: [Rust] Disable cpm_nan clippy error
github-actions[bot] commented on pull request #7710: URL: https://github.com/apache/arrow/pull/7710#issuecomment-656926979 https://issues.apache.org/jira/browse/ARROW-9413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7710: ARROW-9413: [Rust] Disable cpm_nan clippy error
nevi-me commented on pull request #7710: URL: https://github.com/apache/arrow/pull/7710#issuecomment-656924262 CC @houqp This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
nealrichardson closed pull request #7694: URL: https://github.com/apache/arrow/pull/7694 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me opened a new pull request #7710: ARROW-9413: [Rust] Disable cpm_nan clippy error
nevi-me opened a new pull request #7710: URL: https://github.com/apache/arrow/pull/7710 Using the comparison recommended by clippy makes sorts with `NAN` undeterministic. We currently sort NAN separately to nulls, we couldcan resolve this separately This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels
wesm commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-656921774 I googled a little bit about the compilation errors and it seems to be a gcc bug. I think that some of these kernels are going to have to be rewritten to use inline lambdas instead of CRTP. There should be no performance difference. I'll take a closer look at it this weekend This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656921476 Also cc @xhochy. IDK how serious this is; I personally have never used this conda setup, and what would make more sense would be to test the actual conda recipe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again
nealrichardson closed pull request #7707: URL: https://github.com/apache/arrow/pull/7707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only c
nevi-me commented on pull request #7693: URL: https://github.com/apache/arrow/pull/7693#issuecomment-656917911 > @nevi-me When I was testing it we weren't seeing it while reading the record batch back directly; we had to complete the serialization to a byte stream and then read it back. Yes, writing to a file stream, and reading that back, works. Try the below: ```rust #[test] fn test_arrow_single_float_row() { let schema = Schema::new(vec![ Field::new("a", DataType::Float32, false), Field::new("b", DataType::Float32, false), Field::new("c", DataType::Int32, false), Field::new("d", DataType::Int32, false), ]); let arrays = vec![ Arc::new(Float32Array::from(vec![1.23])) as ArrayRef, Arc::new(Float32Array::from(vec![-6.50])) as ArrayRef, Arc::new(Int32Array::from(vec![2])) as ArrayRef, Arc::new(Int32Array::from(vec![1])) as ArrayRef, ]; let batch = RecordBatch::try_new(Arc::new(schema.clone()), arrays).unwrap(); // create stream writer let file = File::create("target/debug/testdata/float.stream").unwrap(); let mut stream_writer = StreamWriter::try_new(file, ).unwrap(); stream_writer.write().unwrap(); stream_writer.finish().unwrap(); // read stream back let file = File::open("target/debug/testdata/float.stream").unwrap(); let mut reader = StreamReader::try_new(file).unwrap(); let read_batch = reader.next_batch().unwrap().unwrap(); // TODO: test the output } ``` I was able to reproduce the 0.0 values without your fix, and the correct numbers with the fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656914568 Thanks. I just opened https://issues.apache.org/jira/browse/ARROW-9412 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta
tobim commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656912930 I'm fine with that. I'll try to find some time for it tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that static lin
wesm closed pull request #7696: URL: https://github.com/apache/arrow/pull/7696 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
github-actions[bot] commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656910317 Revision: a28dcda15e57f95cbad1007adce6b70e5af03eb9 Submitted crossbow builds: [ursa-labs/crossbow @ actions-406](https://github.com/ursa-labs/crossbow/branches/all?query=actions-406) |Task|Status| ||--| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-406-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-406-github-test-r-linux-as-cran)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
nealrichardson commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656909712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656909818 +1. Let us address the `ARROW_STATIC_INSTALL_INTERFACE_LIBS` issue and associated testing as a follow up PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] maxburke commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only
maxburke commented on pull request #7693: URL: https://github.com/apache/arrow/pull/7693#issuecomment-656908743 @nevi-me When I was testing it we weren't seeing it while reading the record batch back directly; we had to complete the serialization to a byte stream and then read it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656907552 > This is not true. Sorry, my presumption was that when a system library is installed there is a shared library available. If there is both a shared and static library available for a dependency, the build system is _supposed_ to select the shared one. But you are right if there is only a static library (e.g. if someone created an all-static "system" toolchain), then indeed static libraries would be selected. But they would not be added to `libarrow_bundled_dependencies.a`, and they aren't. > That's how it should be then. But by removing ARROW_STATIC_INSTALL_INTERFACE_LIBS you are breaking the case of external static dependencies. If you would only add those libs that are not BUNDLED to the list I'd be happy. Well, static linking was not possible at all prior to this PR when using the recommended memory allocator (jemalloc). So nothing has been "broken" because it never worked before. The case of external static dependencies isn't tested at all -- in no CI or integration task. If this use case is important to you, we could use your help writing Dockerized integration tests that exhibit this scenario so that we aren't flying blind like we are now. I agree with adding non-bundled static libraries to `ARROW_STATIC_INSTALL_INTERFACE_LIBS` but we have to write tests to demonstrate that it works and that it does not break the use cases represented in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta
tobim commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656902745 > our build system does not select static libraries when using -DARROW_DEPENDENCY_SOURCE=SYSTEM This is not true. > only add static libraries to `libarrow_bundled_dependencies.a` that the Arrow build system built itself. That's how it should be then. But by removing `ARROW_STATIC_INSTALL_INTERFACE_LIBS` you are breaking the case of external static dependencies. If you would only add those libs that are not `BUNDLED` to the list I'd be happy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me closed pull request #7193: ARROW-7924: [Rust] Add sort for float types
nevi-me closed pull request #7193: URL: https://github.com/apache/arrow/pull/7193 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels
wesm commented on a change in pull request #7668: URL: https://github.com/apache/arrow/pull/7668#discussion_r453084856 ## File path: r/R/array.R ## @@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) { nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like if (inherits(i, "array_expression")) { # Evaluate it +# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is resolved +# i <- eval_array_expression(i) Review comment: Oh right. I'll add metafunctions for calling "match" and "isin"? That's pretty easy, I'm not sure how else to resolve it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sunchao commented on pull request #7176: ARROW-8796: [Rust] feat: Allow writers to use Vec
sunchao commented on pull request #7176: URL: https://github.com/apache/arrow/pull/7176#issuecomment-656898683 Just took another look. If we are going with this, we can potentially make the API simpler. For instance, current it is: ```rust fn next_row_group( self) -> Result>; ``` As the returned box already is tracked by non-static lifetime, we can perhaps change the API to: ```rust fn next_row_group( self) -> Result< RowGroupWriter>; ``` and track the current group writer within the file writer. With this we can also potentially remove `close_row_group`. Similar things can be done for column writer. We can do this in a separate PR though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
github-actions[bot] commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656898878 Revision: b455526af7db3d2849238dbf20fb2643396b6224 Submitted crossbow builds: [ursa-labs/crossbow @ actions-405](https://github.com/ursa-labs/crossbow/branches/all?query=actions-405) |Task|Status| ||--| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-405-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-405-github-test-r-linux-as-cran)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
nealrichardson commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656898194 @github-actions crossbow submit *as-cran* This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656897869 Ok I've worked around the decor issue but now there's what appears to be a legit failure, a segfault on the new dictionary integration test: https://github.com/ursa-labs/crossbow/runs/859461851?check_suite_focus=true#step:6:4357 cc @romainfrancois This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels
nealrichardson commented on a change in pull request #7668: URL: https://github.com/apache/arrow/pull/7668#discussion_r453082760 ## File path: r/R/array.R ## @@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) { nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like if (inherits(i, "array_expression")) { # Evaluate it +# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is resolved +# i <- eval_array_expression(i) Review comment: Yes, I know, I started working on this and ran into ARROW-9389. I can hack around this here for now, but I was leaning towards holding this until after 1.0 anyway. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels
wesm commented on a change in pull request #7668: URL: https://github.com/apache/arrow/pull/7668#discussion_r453081425 ## File path: r/R/array.R ## @@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) { nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like if (inherits(i, "array_expression")) { # Evaluate it +# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is resolved +# i <- eval_array_expression(i) Review comment: When uncommenting this I get this failure ``` ══ Failed ══ ── 1. Error: filter() with %in% (@test-dplyr.R#149) ─── 'match' requires vector arguments Backtrace: 1. arrow:::expect_dplyr_equal(...) testthat/test-dplyr.R:149:2 2. dplyr::filter(., dbl > 2, chr %in% c("d", "f")) 10. dplyr::collect(.) 14. arrow:::`[.RecordBatch`(...) 15. arrow:::filter_rows(x, i, ...) 16. arrow:::eval_array_expression(i) 17. base::lapply(...) 18. arrow:::FUN(X[[i]], ...) 19. arrow:::eval_array_expression(a) 21. %in% c("d", "f") ══ DONE ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7709: ARROW-9411: [Rust] Update dependencies
github-actions[bot] commented on pull request #7709: URL: https://github.com/apache/arrow/pull/7709#issuecomment-656894889 https://issues.apache.org/jira/browse/ARROW-9411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels
wesm commented on a change in pull request #7668: URL: https://github.com/apache/arrow/pull/7668#discussion_r453079515 ## File path: r/R/array.R ## @@ -269,6 +263,8 @@ filter_rows <- function(x, i, keep_na = TRUE, ...) { nrows <- x$num_rows %||% x$length() # Depends on whether Array or Table-like if (inherits(i, "array_expression")) { # Evaluate it +# Switch this when https://issues.apache.org/jira/browse/ARROW-9380 is resolved +# i <- eval_array_expression(i) Review comment: I'm doing this now and checking the tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static libra
github-actions[bot] commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656892388 Revision: f43ce4606996ed44980dba514c824786e70729d3 Submitted crossbow builds: [ursa-labs/crossbow @ actions-404](https://github.com/ursa-labs/crossbow/branches/all?query=actions-404) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-centos-6-amd64)| |centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-centos-7-amd64)| |centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-centos-8-amd64)| |debian-buster-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-debian-buster-amd64)| |debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |debian-stretch-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-debian-stretch-amd64)| |debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-bionic-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-bionic-amd64)| |ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-eoan-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-eoan-amd64)| |ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-focal-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-focal-amd64)| |ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-xenial-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-404-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-404-github-ubuntu-xenial-amd64)| |ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-404-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me opened a new pull request #7709: ARROW-9411: [Rust] Update dependencies
nevi-me opened a new pull request #7709: URL: https://github.com/apache/arrow/pull/7709 This updates some dependencies like prost and rand. Helps reduce number of dependencies This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656891774 Fingers crossed that I've fixed the gRPC bundling problem and the Linux packages work this time This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656891639 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so th
wesm edited a comment on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656890077 > I still think that "-DARROW_DEPENDENCY_SOURCE=SYSTEM" should work with arrow-static. > For this case my opinion is that non-vendored dependencies should not be bundled. Normally I want use the SYSTEM libraries for dependencies. As a developer of a third-party I do not want arrow to add symbols it does not own to my static linking process. @tobim our build system does not select static libraries when using `-DARROW_DEPENDENCY_SOURCE=SYSTEM` so they would not be bundled in `libarrow_bundled_dependencies.a`. That's kind of the whole point of this PR -- only add static libraries to `libarrow_bundled_dependencies.a` that the Arrow build system built itself. If `arrow_static` has shared library dependencies on system libraries, then some other approach will be required to inform CMake automatically when doing `find_package(Arrow)`. It's definitely out of the scope for this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656890077 > I still think that "-DARROW_DEPENDENCY_SOURCE=SYSTEM" should work with arrow-static. > For this case my opinion is that non-vendored dependencies should not be bundled. Normally I want use the SYSTEM libraries for dependencies. As a developer of a third-party I do not want arrow to add symbols it does not own to my static linking process. @tobim our build system does not select static libraries when using `-DARROW_DEPENDENCY_SOURCE=SYSTEM` so they would not be bundled in `libarrow_bundled_dependencies.a`. That's kind of the whole point of this PR -- only add static libraries to `libarrow_bundled_dependencies.a` that the Arrow build system built itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sunchao commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate
sunchao commented on pull request #7666: URL: https://github.com/apache/arrow/pull/7666#issuecomment-656889046 > The Rc version is used in a few places but again I don't think it's that important to use Rc over Arc. Agreed. I don't see any performance implication for this, as `schema()` shouldn't appear on hot paths. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jeroen commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
jeroen commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656888136 Not from the configure step I think. These are all `make` variables so the native way to build embedded libs is to pass these down when invoking a sub-make, like so: https://github.com/rstudio/httpuv/blob/master/src/Makevars#L65-L70 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta
tobim commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656887788 > Re: system dependencies and static build, is https://issues.apache.org/jira/browse/ARROW-6312 related? I encountered this issue in https://issues.apache.org/jira/browse/ARROW-9303, and isn't this is a slightly different problem? ARROW-6312 is the `pkg-config` variant of the approach I suggested for CMake. > Because your system dependencies could be shared libraries that you couldn't glue together anyway, right? Luckily I have control over linking flavor of the libraries on my system search path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
nealrichardson commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656884857 Is there a way to get that config without shelling out at all? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jeroen commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
jeroen commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656883822 You have a bug here because your version of `R_CMD_config` takes R from the PATH but that may not be the correct R. Some machines have many versions of R. It may be safer to use e.g. `tools::Rcmd(c('config', 'CXX'), stdout=T)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so
nealrichardson commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656882522 Re: system dependencies and static build, is https://issues.apache.org/jira/browse/ARROW-6312 related? I encountered this issue in https://issues.apache.org/jira/browse/ARROW-9303, and isn't this is a slightly different problem? Because your system dependencies could be shared libraries that you couldn't glue together anyway, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7708: ARROW-9292: [Doc] Remove Rust from feature matrix
github-actions[bot] commented on pull request #7708: URL: https://github.com/apache/arrow/pull/7708#issuecomment-656882517 https://issues.apache.org/jira/browse/ARROW-9292 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again
github-actions[bot] commented on pull request #7707: URL: https://github.com/apache/arrow/pull/7707#issuecomment-656882518 https://issues.apache.org/jira/browse/ARROW-9410 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me opened a new pull request #7708: ARROW-9292: [Doc] Remove Rust from feature matrix
nevi-me opened a new pull request #7708: URL: https://github.com/apache/arrow/pull/7708 We currently have problems with Rust integration testing, so I'm temporarily removing them from the documentation This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
github-actions[bot] commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656881900 Revision: 002a43e4dd7504c9911d458d8f05f6aecf1a564f Submitted crossbow builds: [ursa-labs/crossbow @ actions-403](https://github.com/ursa-labs/crossbow/branches/all?query=actions-403) |Task|Status| ||--| |test-conda-r-4.0|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-403-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-403-github-test-conda-r-4.0)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tobim commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that sta
tobim commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656880866 > We can't put the Arrow toolchain libraries in INTERFACE_LINK_LIBRARIES because they will be unrecognized by the third party project. I still think that `"-DARROW_DEPENDENCY_SOURCE=SYSTEM"` should work with `arrow-static`. For this case my opinion is that non-vendored dependencies should not be bundled. Normally I want use the SYSTEM libraries for dependencies. As a developer of a third-party I do not want arrow to add symbols it does not own to my static linking process. May I suggest a different solution for that aspect of the problem: CMake provides a [module for declaring dependencies](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html) in the templated _Project_Config.cmake. That allows you to insert your own dependencies to the target graph of the downstream project. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656881146 @github-actions crossbow submit *conda-r* This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again
github-actions[bot] commented on pull request #7707: URL: https://github.com/apache/arrow/pull/7707#issuecomment-656880263 Revision: 04f80607d70cd4a7c892512a14a76e145de50486 Submitted crossbow builds: [ursa-labs/crossbow @ actions-402](https://github.com/ursa-labs/crossbow/branches/all?query=actions-402) |Task|Status| ||--| |homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-402-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again
nealrichardson commented on pull request #7707: URL: https://github.com/apache/arrow/pull/7707#issuecomment-656879577 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7707: ARROW-9410: [CI][Crossbow] Fix homebrew-cpp again
nealrichardson opened a new pull request #7707: URL: https://github.com/apache/arrow/pull/7707 Pulls some changes to the formula from upstream This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
github-actions[bot] commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656876659 https://issues.apache.org/jira/browse/ARROW-9409 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
github-actions[bot] commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656873001 Revision: 1647387c4eaabe5ab13515be04d88abab7c9e137 Submitted crossbow builds: [ursa-labs/crossbow @ actions-401](https://github.com/ursa-labs/crossbow/branches/all?query=actions-401) |Task|Status| ||--| |test-conda-r-4.0|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-401-github-test-conda-r-4.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-401-github-test-conda-r-4.0)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson commented on pull request #7706: URL: https://github.com/apache/arrow/pull/7706#issuecomment-656872355 @github-actions crossbow submit *conda-r* This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7706: ARROW-9409: [CI][Crossbow] Nightly conda-r fails
nealrichardson opened a new pull request #7706: URL: https://github.com/apache/arrow/pull/7706 It's annoying that we need to declare these separately This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7705: ARROW-9408: [Integration] Fix Windows numpy datagen issues
github-actions[bot] commented on pull request #7705: URL: https://github.com/apache/arrow/pull/7705#issuecomment-656871192 https://issues.apache.org/jira/browse/ARROW-9408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7695: ARROW-8989: [C++][Doc] Document available compute functions
bkietz commented on a change in pull request #7695: URL: https://github.com/apache/arrow/pull/7695#discussion_r453046448 ## File path: docs/source/cpp/compute.rst ## @@ -0,0 +1,419 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: cpp +.. highlight:: cpp +.. cpp:namespace:: arrow::compute + += +Compute Functions += + +.. TODO: describe API and how to invoke compute functions + +Available functions +=== + +Aggregations + + ++--+++---++ +| Function name| Arity | Input types| Output type | Options class | ++==+++===++ +| count| Unary | Any| Scalar Int64 | :struct:`CountOptions` | ++--+++---++ +| mean | Unary | Numeric| Scalar Float64 || ++--+++---++ +| minmax | Unary | Numeric| Scalar Struct (1)| :struct:`MinMaxOptions`| ++--+++---++ +| sum | Unary | Numeric| Scalar Numeric (2)|| ++--+++---++ + +Notes: + +* \(1) Output is a ``{"min": input type, "max": input type}`` Struct + +* \(2) Output is Int64, UInt64 or Float64, depending on the input type + + +Element-wise ("scalar") functions +- + +Arithmetic functions + + +Those functions expect two inputs of the same type and apply a given binary +operation to each pair of elements gathered from the inputs. Each function +is also available in an overflow-checking variant, suffixed ``_checked``. + +If any of the input elements in a pair is null, the corresponding output +element is null. + ++--+++-+ +| Function name| Arity | Input types| Output type | ++==+++=+ +| add | Binary | Numeric| Numeric | ++--+++-+ +| add_checked | Binary | Numeric| Numeric | ++--+++-+ +| multiply | Binary | Numeric| Numeric | ++--+++-+ +| multiply_checked | Binary | Numeric| Numeric | ++--+++-+ +| subtract | Binary | Numeric| Numeric | ++--+++-+ +| subtract_checked | Binary | Numeric| Numeric | ++--+++-+ + +Comparisons +~~~ + +Those functions expect two inputs of the same type and apply a given +comparison operator. If any of the input elements in a pair is null, +the corresponding output element is null. + ++--++-+-+ +| Function names
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656869168 The one test failure is spurious ``` pyarrow/tests/test_fs.py:1285: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pyarrow/_fs.pyx:439: in pyarrow._fs.FileSystem.get_file_info infos = GetResultValue(self.fs.GetFileInfo(selector)) pyarrow/error.pxi:122: in pyarrow.lib.pyarrow_internal_check_status return check_status(status) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > raise IOError(message) E OSError: When listing objects under key '' in bucket 'ursa-labs-taxi-data': AWS Error [code 99]: Unable to connect to endpoint E In ../src/arrow/filesystem/s3fs.cc, line 1098, code: ListObjectsV2(bucket, key, std::move(handle_results), std::move(handle_error)) E In ../src/arrow/filesystem/s3fs.cc, line 1358, code: impl_->Walk(select, base_path.bucket, base_path.key, ) ``` I verified that the minimal build still works on Linux (presumably it's therefore fine on macOS) and Windows/MSVC This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me opened a new pull request #7705: ARROW-9408: [Integration] Fix Windows numpy datagen issues
nevi-me opened a new pull request #7705: URL: https://github.com/apache/arrow/pull/7705 We found that the integer range check when generating integration data doesn't work in Windows because the default C integers that numpy uses are 32-bit by default in Windows. This fixes that issue by forcing 64-bit integers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements
eerhardt commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r453045241 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType) ValueOffsets = new ArrowBuffer.Builder(); ValueBuffer = new ArrowBuffer.Builder(); ValidityBuffer = new ArrowBuffer.BitmapBuilder(); + +// From the docs: +// +// The offsets buffer contains length + 1 signed integers (either 32-bit or 64-bit, depending on the +// logical type), which encode the start position of each slot in the data buffer. The length of the +// value in each slot is computed using the difference between the offset at that slot’s index and the +// subsequent offset. +// +// In this builder, we choose to append the first offset (zero) upon construction, and each trailing +// offset is then added after each individual item has been appended. +ValueOffsets.Append(this.Offset); } protected abstract TArray Build(ArrayData data); -public int Length => ValueOffsets.Length; +/// +/// Gets the length of the array built so far. +/// +public int Length => ValueOffsets.Length - 1; +/// +/// Build an Arrow array from the appended contents so far. +/// +/// Optional memory allocator. +/// Returns an array of type . public TArray Build(MemoryAllocator allocator = default) { -ValueOffsets.Append(Offset); - -ArrowBuffer validityBuffer = NullCount > 0 -? ValidityBuffer.Build(allocator) -: ArrowBuffer.Empty; - -var data = new ArrayData(DataType, ValueOffsets.Length - 1, NullCount, 0, -new[] { validityBuffer, ValueOffsets.Build(allocator), ValueBuffer.Build(allocator) }); +var bufs = new[] +{ +NullCount > 0 ? ValidityBuffer.Build(allocator) : ArrowBuffer.Empty, +ValueOffsets.Build(allocator), +ValueBuffer.Build(allocator), +}; +var data = new ArrayData( +DataType, +length: ValueOffsets.Length - 1, +NullCount, +offset: 0, +bufs); return Build(data); } +/// +/// Append a single null value to the array. +/// +/// Returns the builder (for fluent-style composition). public TBuilder AppendNull() { -ValueOffsets.Append(Offset); +// Do not add to the value buffer in the case of a null. +// Note that we do not need to increment the offset as a result. ValidityBuffer.Append(false); +ValueOffsets.Append(Offset); return Instance; } +/// +/// Appends a value, consisting of a single byte, to the array. +/// +/// Byte value to append. +/// Returns the builder (for fluent-style composition). public TBuilder Append(byte value) { -ValueOffsets.Append(Offset); ValueBuffer.Append(value); -Offset++; ValidityBuffer.Append(true); +Offset++; +ValueOffsets.Append(Offset); return Instance; } +/// +/// Append a value, consisting of a span of bytes, to the array. +/// +/// +/// Note that a single value is added, which consists of arbitrarily many bytes. If multiple values are +/// to be added, use the method. +/// +/// Span of bytes to add. +/// Returns the builder (for fluent-style composition). public TBuilder Append(ReadOnlySpan span) { -ValueOffsets.Append(Offset); ValueBuffer.Append(span); ValidityBuffer.Append(true); Offset += span.Length; +ValueOffsets.Append(Offset); return Instance; } -public TBuilder AppendRange(IEnumerable values) +/// +/// Append a value, consisting of an enumerable collection of bytes, to the array. +/// +/// +/// Note that this method appends a single value, which may consist of arbitrarily many bytes. If multiple +
[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only c
nevi-me commented on pull request #7693: URL: https://github.com/apache/arrow/pull/7693#issuecomment-656859641 Hey @maxburke, I think I understand the gist of the problem, but do you need to create the `ParquetBufWriter` struct to test this? Wouldn't writing a single record batch with the arrays to a stream, then reading it back, work? Or am I oversimplifying the problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static libra
github-actions[bot] commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656858894 Revision: 1f24b9e60587f1b42ee0a8d903e7cc9fc051 Submitted crossbow builds: [ursa-labs/crossbow @ actions-400](https://github.com/ursa-labs/crossbow/branches/all?query=actions-400) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-centos-6-amd64)| |centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-centos-7-amd64)| |centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-centos-8-amd64)| |debian-buster-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-debian-buster-amd64)| |debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |debian-stretch-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-debian-stretch-amd64)| |debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-bionic-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-bionic-amd64)| |ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-eoan-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-eoan-amd64)| |ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-focal-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-focal-amd64)| |ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-xenial-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-400-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-400-github-ubuntu-xenial-amd64)| |ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-400-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656858004 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate
paddyhoran commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r453044078 ## File path: rust/datafusion/src/execution/physical_plan/common.rs ## @@ -47,12 +47,12 @@ impl RecordBatchIterator { } } -impl BatchIterator for RecordBatchIterator { +impl SendableRecordBatchReader for RecordBatchIterator { fn schema() -> Arc { Review comment: I'm +0 on this. I personally like seeing the explicit type signature (once it's not very verbose) but I have no problem with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library
wesm commented on a change in pull request #7696: URL: https://github.com/apache/arrow/pull/7696#discussion_r453043649 ## File path: cpp/examples/minimal_build/CMakeLists.txt ## @@ -19,10 +19,31 @@ cmake_minimum_required(VERSION 3.0) project(ArrowMinimalExample) +option(ARROW_LINK_SHARED "Link to the Arrow shared library" ON) + find_package(Arrow REQUIRED) +set(CMAKE_CXX_STANDARD 11) +set(CMAKE_BUILD_TYPE Release) + message(STATUS "Arrow version: ${ARROW_VERSION}") message(STATUS "Arrow SO version: ${ARROW_FULL_SO_VERSION}") add_executable(arrow_example example.cc) -target_link_libraries(arrow_example PRIVATE arrow_shared) + +if (ARROW_LINK_SHARED) + target_link_libraries(arrow_example PRIVATE arrow_shared) +else() + set(THREADS_PREFER_PTHREAD_FLAG ON) + find_package(Threads REQUIRED) + + # We must also link to libarrow_bundled_dependencies.a + get_property(arrow_static_loc TARGET arrow_static PROPERTY LOCATION) + get_filename_component(arrow_lib_dir ${arrow_static_loc} DIRECTORY) + + add_library(arrow_dependencies STATIC IMPORTED) + set_target_properties(arrow_dependencies +PROPERTIES IMPORTED_LOCATION + "${arrow_lib_dir}/${CMAKE_STATIC_LIBRARY_PREFIX}arrow_bundled_dependencies${CMAKE_STATIC_LIBRARY_SUFFIX}") Review comment: I got this working, awesome! I'll update the docs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate
paddyhoran commented on pull request #7666: URL: https://github.com/apache/arrow/pull/7666#issuecomment-656856701 > I think the `schema` method is already returning a `Arc`, where is the `Rc` being used? IMHO if it is not too difficult to do we should consolidate on a single trait now instead of having to change it later again. It did occur to me that we could just have a single trait that uses `Arc` and not bother with the `Rc` version. I'm not sure what the performance of `Arc` is versus `Rc`. Obviously, there has to be some difference but is this important in the context of Arrow? I'm up for converting to one trait and using `Arc`. If users request it then we can add the `Rc` version later. What do others think? The `Rc` version is used in a few places but again I don't think it's that important to use `Rc` over `Arc`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements
eerhardt commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r453042194 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -237,7 +329,9 @@ public ReadOnlySpan GetBytes(int index) if (IsNull(index)) { -return null; +// Note that `return null;` is valid syntax, but would be misleading as `null` in the context of a span +// is actually returned as an empty span. +return ReadOnlySpan.Empty; Review comment: I've been confused and bit by this a few times in different projects. All the following ways produce an "equivalent" span: * `return null` (this uses the implicit operator from an array) * `return default` * `return ReadOnlySpan.Empty` (this just says `return default` underneath the covers) I agree that `ReadOnlySpan.Empty` is the most clear thing to return here. Next would be `return default`. `return null` is the worst of the three options IMO. ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType) ValueOffsets = new ArrowBuffer.Builder(); ValueBuffer = new ArrowBuffer.Builder(); ValidityBuffer = new ArrowBuffer.BitmapBuilder(); + +// From the docs: +// +// The offsets buffer contains length + 1 signed integers (either 32-bit or 64-bit, depending on the +// logical type), which encode the start position of each slot in the data buffer. The length of the +// value in each slot is computed using the difference between the offset at that slot’s index and the +// subsequent offset. +// +// In this builder, we choose to append the first offset (zero) upon construction, and each trailing +// offset is then added after each individual item has been appended. +ValueOffsets.Append(this.Offset); } protected abstract TArray Build(ArrayData data); -public int Length => ValueOffsets.Length; +/// +/// Gets the length of the array built so far. +/// +public int Length => ValueOffsets.Length - 1; +/// +/// Build an Arrow array from the appended contents so far. +/// +/// Optional memory allocator. +/// Returns an array of type . public TArray Build(MemoryAllocator allocator = default) { -ValueOffsets.Append(Offset); - -ArrowBuffer validityBuffer = NullCount > 0 -? ValidityBuffer.Build(allocator) -: ArrowBuffer.Empty; - -var data = new ArrayData(DataType, ValueOffsets.Length - 1, NullCount, 0, -new[] { validityBuffer, ValueOffsets.Build(allocator), ValueBuffer.Build(allocator) }); +var bufs = new[] +{ +NullCount > 0 ? ValidityBuffer.Build(allocator) : ArrowBuffer.Empty, +ValueOffsets.Build(allocator), +ValueBuffer.Build(allocator), +}; +var data = new ArrayData( +DataType, +length: ValueOffsets.Length - 1, Review comment: Can this line just be `Length` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements
eerhardt commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r453038089 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType) ValueOffsets = new ArrowBuffer.Builder(); ValueBuffer = new ArrowBuffer.Builder(); ValidityBuffer = new ArrowBuffer.BitmapBuilder(); + +// From the docs: +// +// The offsets buffer contains length + 1 signed integers (either 32-bit or 64-bit, depending on the +// logical type), which encode the start position of each slot in the data buffer. The length of the +// value in each slot is computed using the difference between the offset at that slot’s index and the +// subsequent offset. +// +// In this builder, we choose to append the first offset (zero) upon construction, and each trailing +// offset is then added after each individual item has been appended. +ValueOffsets.Append(this.Offset); Review comment: Nice fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7703: ARROW-9355 [R]: Fix -Wimplicit-int-float-conversion
nealrichardson closed pull request #7703: URL: https://github.com/apache/arrow/pull/7703 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7694: ARROW-9397: [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
github-actions[bot] commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656837525 https://issues.apache.org/jira/browse/ARROW-9397 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7703: ARROW-9355 [R]: Fix -Wimplicit-int-float-conversion
nealrichardson commented on pull request #7703: URL: https://github.com/apache/arrow/pull/7703#issuecomment-656832748 @github-actions autotune everything This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray
eerhardt commented on pull request #7654: URL: https://github.com/apache/arrow/pull/7654#issuecomment-656832270 Can we just introduce new `Append(DateTime value)` and `GetDateAsDateTime(int index)` APIs instead of making a breaking change? In general, DateTimeOffset is the preferred type to use when talking about dates and times. The reasoning is because exactly what you point out in the JIRA issue - DateTime.Kind is very confusing and can cause a lot of problems. So we generally recommend users to use DateTimeOffset unless they have a strong reason to use DateTime. So I'd suggest to keep the existing code (the builder is based on DateTimeOffset), but to also add new `Append(DateTime value)` overloads to solve the issue you logged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org