[GitHub] [arrow] emkornfield commented on a change in pull request #7965: ARROW-9737: [C++][Gandiva] Add bitwise_xor() for integers

2020-08-26 Thread GitBox


emkornfield commented on a change in pull request #7965:
URL: https://github.com/apache/arrow/pull/7965#discussion_r478145554



##
File path: cpp/src/gandiva/precompiled/arithmetic_ops_test.cc
##
@@ -120,6 +120,14 @@ TEST(TestArithmeticOps, TestBitwiseOps) {
 0xFF7E);
   EXPECT_EQ(bitwise_or_int64_int64(0x6A5B1, 0x0), 0x6A5B1);
 
+  // bitwise XOR
+  EXPECT_EQ(bitwise_xor_int32_int32(0x0147D, 0x17159), 0x16524);
+  EXPECT_EQ(bitwise_xor_int32_int32(0xFFCC, 0x0297), 0XFD5B);
+  EXPECT_EQ(bitwise_xor_int32_int32(0x000, 0x285), 0x285);
+  EXPECT_EQ(bitwise_xor_int64_int64(0x563672F83, 0x0D9FCF85B), 0x5BA9BD7D8);
+  EXPECT_EQ(bitwise_xor_int64_int64(0xFFDA8F6A, 0x791C), 
0X25F676);
+  EXPECT_EQ(bitwise_xor_int64_int64(0x6A5B1, 0x0), 0x6A5B1);
+

Review comment:
   could you add a test to show xor a value with itself is zero?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7898: ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices

2020-08-26 Thread GitBox


mrkn commented on a change in pull request #7898:
URL: https://github.com/apache/arrow/pull/7898#discussion_r478143508



##
File path: cpp/src/arrow/array/array_dict_test.cc
##
@@ -904,6 +904,67 @@ TEST(TestDecimalDictionaryBuilder, DoubleTableSize) {
   ASSERT_TRUE(expected.Equals(result));
 }
 
+TEST(TestNullDictionaryBuilder, Basic) {
+  // MakeBuilder
+  auto dict_type = dictionary(int8(), null());
+  std::unique_ptr boxed_builder;
+  ASSERT_OK(MakeBuilder(default_memory_pool(), dict_type, _builder));
+  auto& builder = checked_cast&>(*boxed_builder);
+
+  ASSERT_OK(builder.AppendNull());
+  ASSERT_OK(builder.AppendNull());
+  ASSERT_OK(builder.AppendNull());
+  ASSERT_EQ(3, builder.length());
+  ASSERT_EQ(3, builder.null_count());
+
+  ASSERT_OK(builder.AppendNulls(4));
+  ASSERT_EQ(7, builder.length());
+  ASSERT_EQ(7, builder.null_count());
+
+  auto int_array = ArrayFromJSON(int8(), "[0, 1, 0, null]");
+  ASSERT_OK(builder.AppendArray(*int_array));

Review comment:
   @pitrou I tried to directly insert `DCHECK` in `AppendArray`, but I 
found it is difficult because `AppendArray` is defined in `builder_dict.h`. 
That is a public header file` so we cannot use `DCHECK` there.
   
   Instead of directly inserting `DCHECK`, the candidate patch is below.
   
   ```diff
   diff --git a/cpp/src/arrow/array/builder_dict.cc 
b/cpp/src/arrow/array/builder_dict.cc
   index 54fd94856..dc239f268 100644
   --- a/cpp/src/arrow/array/builder_dict.cc
   +++ b/cpp/src/arrow/array/builder_dict.cc
   @@ -189,5 +189,11 @@ Status DictionaryMemoTable::InsertValues(const Array& 
array) {
   
int32_t DictionaryMemoTable::size() const { return impl_->size(); }
   
   +#ifndef NDEBUG
   +void CheckArrayType(const Type::type type_id, const Array& array, const 
std::string& message) {
   +  DCHECK_EQ(type_id, array.type_id()) << message;
   +}
   +#endif
   +
}  // namespace internal
}  // namespace arrow
   diff --git a/cpp/src/arrow/array/builder_dict.h 
b/cpp/src/arrow/array/builder_dict.h
   index f8c77a5b0..a0d02fd1b 100644
   --- a/cpp/src/arrow/array/builder_dict.h
   +++ b/cpp/src/arrow/array/builder_dict.h
   @@ -93,6 +93,10 @@ class ARROW_EXPORT DictionaryMemoTable {
  std::unique_ptr impl_;
};
   
   +#ifndef NDEBUG
   +void CheckArrayType(const Type::type type_id, const Array& array, const 
std::string& message);
   +#endif
   +
/// \brief Array builder for created encoded DictionaryArray from
/// dense array
///
   @@ -248,6 +252,10 @@ class DictionaryBuilderBase : public ArrayBuilder {
  const Array& array) {
using ArrayType = typename TypeTraits::ArrayType;
   
   +#ifndef NDEBUG
   +CheckArrayType(T::type_id, array, "Wrong value type of array to be 
appended");
   +#endif
   +
const auto& concrete_array = static_cast(array);
for (int64_t i = 0; i < array.length(); i++) {
  if (array.IsNull(i)) {
   @@ -406,6 +414,9 @@ class DictionaryBuilderBase : 
public ArrayBuilder {
   
  /// \brief Append a whole dense array to the builder
  Status AppendArray(const Array& array) {
   +#ifndef NDEBUG
   +CheckArrayType(NullType::type_id, array, "Wrong value type of array to 
be appended");
   +#endif
for (int64_t i = 0; i < array.length(); i++) {
  ARROW_RETURN_NOT_OK(AppendNull());
}
};
   
}  // namespace internal
   ```
   
   I don't think this approach is good.  Can I follow this way?
   
   By the way, I found that `AppendArray` for `DictionaryBuilderBase` of 
`FixedSizedBinaryType` checks the array type at the beginning of the function 
and return `Status::Invalid` when mismatching type types.  I guess if this kind 
of check is accepted for `FixedSizedBinaryType`, the same kind of check can be 
put in the other `AppendArray` functions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7883: ARROW-9620: [C++] Added split function to Gandiva

2020-08-26 Thread GitBox


emkornfield commented on pull request #7883:
URL: https://github.com/apache/arrow/pull/7883#issuecomment-681545786


   +1 merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #7973: ARROW-8493: [C++][Parquet] Start populating repeated ancestor defintion

2020-08-26 Thread GitBox


emkornfield commented on a change in pull request #7973:
URL: https://github.com/apache/arrow/pull/7973#discussion_r478100513



##
File path: cpp/src/parquet/arrow/arrow_schema_test.cc
##
@@ -1140,5 +1144,244 @@ TEST(TestFromParquetSchema, CorruptMetadata) {
   ASSERT_RAISES(IOError, FromParquetSchema(parquet_schema, props, 
_schema));
 }
 
+::arrow::Result> RootToTreeLeafLevels(
+const SchemaManifest& manifest, int column_number) {
+  std::deque out;
+  const SchemaField* field;
+  RETURN_NOT_OK(manifest.GetColumnField(column_number, ));
+  while (field != nullptr) {
+out.push_front(field->level_info);
+field = manifest.GetParent(field);
+  }
+  return out;
+}
+
+class TestLevels : public ::testing::Test {
+ public:
+  virtual void SetUp() {}
+
+  ::arrow::Status MaybeSetParquetSchema(const NodePtr& column) {
+descriptor_.reset(new SchemaDescriptor());
+manifest_.reset(new SchemaManifest());
+descriptor_->Init(GroupNode::Make("root", Repetition::REQUIRED, {column}));
+return SchemaManifest::Make(descriptor_.get(),
+std::shared_ptr(),
+ArrowReaderProperties(), manifest_.get());
+  }
+  void SetParquetSchema(const NodePtr& column) {
+ASSERT_OK(MaybeSetParquetSchema(column));
+  }
+
+ protected:
+  std::unique_ptr descriptor_;
+  std::unique_ptr manifest_;
+};
+
+TEST_F(TestLevels, TestPrimitive) {
+  SetParquetSchema(
+  PrimitiveNode::Make("node_name", Repetition::REQUIRED, 
ParquetType::BOOLEAN));
+  ASSERT_OK_AND_ASSIGN(std::deque levels,
+   RootToTreeLeafLevels(*manifest_, /*column_number=*/0));
+  EXPECT_THAT(levels, ElementsAre(LevelInfo{/*null_slot_usage=*/1,
+/*def_level=*/0, /*rep_level=*/0,
+/*ancestor_list_def_level*/ 0}));
+  SetParquetSchema(
+  PrimitiveNode::Make("node_name", Repetition::OPTIONAL, 
ParquetType::BOOLEAN));
+  ASSERT_OK_AND_ASSIGN(levels, RootToTreeLeafLevels(*manifest_, 
/*column_number=*/0));
+  EXPECT_THAT(levels, ElementsAre(LevelInfo{/*null_slot_usage=*/1, 
/*def_level=*/1,
+/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0}));
+
+  // Arrow schema: list(bool not null) not null
+  SetParquetSchema(
+  PrimitiveNode::Make("node_name", Repetition::REPEATED, 
ParquetType::BOOLEAN));
+  ASSERT_OK_AND_ASSIGN(levels, RootToTreeLeafLevels(*manifest_, 
/*column_number=*/0));
+  EXPECT_THAT(
+  levels,
+  ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, 
/*rep_level=*/1,
+/*ancestor_list_def_level*/ 0},  // List Field
+  LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, 
/*rep_level=*/1,
+/*ancestor_list_def_level*/ 1}));  //  primitive 
field
+}
+
+TEST_F(TestLevels, TestSimpleGroups) {
+  // Arrow schema: struct(child: struct(inner: boolean not null))
+  SetParquetSchema(GroupNode::Make(
+  "parent", Repetition::OPTIONAL,
+  {GroupNode::Make(
+  "child", Repetition::OPTIONAL,
+  {PrimitiveNode::Make("inner", Repetition::REQUIRED, 
ParquetType::BOOLEAN)})}));
+  ASSERT_OK_AND_ASSIGN(std::deque levels,
+   RootToTreeLeafLevels(*manifest_, /*column_number=*/0));
+  EXPECT_THAT(
+  levels,
+  ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, 
/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0},
+  LevelInfo{/*null_slot_usage=*/1, /*def_level=*/2, 
/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0},
+  LevelInfo{/*null_slot_usage=*/1, /*def_level=*/2, 
/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0}));
+
+  // Arrow schema: struct(child: struct(inner: boolean ))
+  SetParquetSchema(GroupNode::Make(
+  "parent", Repetition::OPTIONAL,
+  {GroupNode::Make(
+  "child", Repetition::OPTIONAL,
+  {PrimitiveNode::Make("inner", Repetition::OPTIONAL, 
ParquetType::BOOLEAN)})}));
+  ASSERT_OK_AND_ASSIGN(levels, RootToTreeLeafLevels(*manifest_, 
/*column_number=*/0));
+  EXPECT_THAT(
+  levels,
+  ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, 
/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0},
+  LevelInfo{/*null_slot_usage=*/1, /*def_level=*/2, 
/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0},
+  LevelInfo{/*null_slot_usage=*/1, /*def_level=*/3, 
/*rep_level=*/0,
+/*ancestor_list_def_level*/ 0}));
+
+  // Arrow schema: struct(child: struct(inner: boolean)) not null
+  SetParquetSchema(GroupNode::Make(
+  "parent", Repetition::REQUIRED,
+  {GroupNode::Make(
+  "child", Repetition::OPTIONAL,
+  {PrimitiveNode::Make("inner", 

[GitHub] [arrow] emkornfield commented on pull request #7887: ARROW-9304: [C++] Add "AppendEmpty" builder APIs for use inside StructBuilder::AppendNull

2020-08-26 Thread GitBox


emkornfield commented on pull request #7887:
URL: https://github.com/apache/arrow/pull/7887#issuecomment-681356508


   @pitrou does this seem ok now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-08-26 Thread GitBox


emkornfield commented on pull request #7767:
URL: https://github.com/apache/arrow/pull/7767#issuecomment-681347717


   @rj-atw where you able to figure out CI?  It looks like this also needs to 
be rebased.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7287: ARROW-8771: [C++] Add boost/process library to build support

2020-08-26 Thread GitBox


emkornfield commented on pull request #7287:
URL: https://github.com/apache/arrow/pull/7287#issuecomment-681346310


   Is there a JIRA open to upgrade to thrift 0.13.0?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-08-26 Thread GitBox


emkornfield commented on pull request #6979:
URL: https://github.com/apache/arrow/pull/6979#issuecomment-681343742


   @wjones1 did you have a chance to figure out CI issues?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-08-26 Thread GitBox


emkornfield commented on pull request #6770:
URL: https://github.com/apache/arrow/pull/6770#issuecomment-681341031


   @nevi-me did your other PRs supercede this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #6676: ARROW-8175: [Python] Setup type checking with mypy

2020-08-26 Thread GitBox


emkornfield commented on pull request #6676:
URL: https://github.com/apache/arrow/pull/6676#issuecomment-681337073


   @jorisvandenbossche @xhochy next steps here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #6506: ARROW-7878: [C++][Compute] Draft LogicalPlan classes

2020-08-26 Thread GitBox


emkornfield commented on pull request #6506:
URL: https://github.com/apache/arrow/pull/6506#issuecomment-681335327


   @wesm @fsaintjacques @kszucs do you want to keep this PR open or resurrect 
it at some point in the future?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7276: ARROW-8947: [Java] MapWithOrdinal javadoc doesn't describe actual behaviour

2020-08-26 Thread GitBox


emkornfield commented on pull request #7276:
URL: https://github.com/apache/arrow/pull/7276#issuecomment-681333219


   @rymurr wonder if you maybe had a chance to discuss with @jacques-n  offline 
to see if there are any sharp edges here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on issue #8025: ParquetWriter creates bad files when passed pyarrow schema from pyarrow table?

2020-08-26 Thread GitBox


emkornfield commented on issue #8025:
URL: https://github.com/apache/arrow/issues/8025#issuecomment-681330368


   @losze1cj where you able to test out the suggestions?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield closed issue #8054: Arrow Flight from Python to JavaScript

2020-08-26 Thread GitBox


emkornfield closed issue #8054:
URL: https://github.com/apache/arrow/issues/8054


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on issue #8054: Arrow Flight from Python to JavaScript

2020-08-26 Thread GitBox


emkornfield commented on issue #8054:
URL: https://github.com/apache/arrow/issues/8054#issuecomment-681324547


   Hi Alex, The JIRA looks like a reasonable start.  Thank you for filing it.  
I'll close this and tag some people who might be able to give you an answer 
there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #8046: ARROW-9850:[Go] Defer should not be used inside a loop

2020-08-26 Thread GitBox


emkornfield commented on pull request #8046:
URL: https://github.com/apache/arrow/pull/8046#issuecomment-681324037


   @sbinet do you have time to take look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #8059: [MINOR] [java][jdbc] Improve error messages for unsupported types.

2020-08-26 Thread GitBox


emkornfield commented on pull request #8059:
URL: https://github.com/apache/arrow/pull/8059#issuecomment-681323737


   @mariusvniekerk thank you for the PR generally it looks good but could you 
add a unit tests to demonstrate cover the cases that generated NPE?
   
   Also, would you mind opening  JIRA to track this issue?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield closed pull request #8040: PARQUET-1904: [C++] Export file_offset in RowGroupMetaData

2020-08-26 Thread GitBox


emkornfield closed pull request #8040:
URL: https://github.com/apache/arrow/pull/8040


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran closed pull request #8051: ARROW-9853: [RUST] Implement take kernel for dictionary arrays

2020-08-26 Thread GitBox


paddyhoran closed pull request #8051:
URL: https://github.com/apache/arrow/pull/8051


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

2020-08-26 Thread GitBox


emkornfield commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681317956


   +1 thank you @scober 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8061: ARROW-9723: [C++][Compute] Count NaN in mode kernel

2020-08-26 Thread GitBox


github-actions[bot] commented on pull request #8061:
URL: https://github.com/apache/arrow/pull/8061#issuecomment-681316938


   https://issues.apache.org/jira/browse/ARROW-9723



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 opened a new pull request #8061: ARROW-9723: [C++][Compute] Count NaN in mode kernel

2020-08-26 Thread GitBox


cyb70289 opened a new pull request #8061:
URL: https://github.com/apache/arrow/pull/8061


   For array [NaN, NaN, 1], output mode = NaN, count = 2



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8060: ARROW-9871: [C++] Add uppercase to ARROW_USER_SIMD_LEVEL

2020-08-26 Thread GitBox


github-actions[bot] commented on pull request #8060:
URL: https://github.com/apache/arrow/pull/8060#issuecomment-681311560


   https://issues.apache.org/jira/browse/ARROW-9871



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind edited a comment on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind

2020-08-26 Thread GitBox


jianxind edited a comment on pull request #8049:
URL: https://github.com/apache/arrow/pull/8049#issuecomment-681200942


   I will create a JIRA for adding upper case to ARROW_USER_SIMD_LEVEL also. 
PR: https://github.com/apache/arrow/pull/8060



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind opened a new pull request #8060: ARROW-9871: [C++] Add uppercase to ARROW_USER_SIMD_LEVEL

2020-08-26 Thread GitBox


jianxind opened a new pull request #8060:
URL: https://github.com/apache/arrow/pull/8060


   Follow the convention of ARROW_SIMD_LEVEL.
   
   Signed-off-by: Frank Du 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x

2020-08-26 Thread GitBox


kou commented on pull request #8047:
URL: https://github.com/apache/arrow/pull/8047#issuecomment-681285381


   OK.
   
   The ML thread: 
https://lists.apache.org/thread.html/r3d4ab3ccc8db7a7a68baa727bf9b6911930b61609c0faaed14e2b773%40%3Cdev.arrow.apache.org%3E



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x

2020-08-26 Thread GitBox


emkornfield commented on pull request #8047:
URL: https://github.com/apache/arrow/pull/8047#issuecomment-681265603


   We should probably hold off  merging more PRs for big endian architectures 
until we come to consensus on this on the ML about support in general



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8059: [MINOR] [java][jdbc] Improve error messages for unsupported types.

2020-08-26 Thread GitBox


github-actions[bot] commented on pull request #8059:
URL: https://github.com/apache/arrow/pull/8059#issuecomment-681202023


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind

2020-08-26 Thread GitBox


jianxind commented on pull request #8049:
URL: https://github.com/apache/arrow/pull/8049#issuecomment-681200942


   I will create a JIRA for adding upper case to ARROW_USER_SIMD_LEVEL also.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-08-26 Thread GitBox


jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477809857



##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template 
+using MinMaxResult = std::pair;
+
+template 
+static enable_if_integer> NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::max();
+  T max = std::numeric_limits::min();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::min(min, values[i]);
+max = std::max(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::min(min, values[i]);
+  max = std::max(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+static enable_if_floating_point> 
NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::infinity();
+  T max = -std::numeric_limits::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::fmin(min, values[i]);
+max = std::fmax(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::fmin(min, values[i]);
+  max = std::fmax(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as();
+
+  auto expected = NaiveMinMax(array);
+  const auto& out_min = checked_cast(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);
+
+  const auto& out_max = checked_cast(*value.value[1]);
+  ASSERT_EQ(expected.second, out_max.value);
+}
+
+template 
+class TestRandomNumericMinMaxKernel : public ::testing::Test {};
+
+TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes);
+TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) {
+  auto rand = random::RandomArrayGenerator(0x8afc055);
+  // Test size up to 1<<13 (8192).

Review comment:
   The new implementation is based on BitBlockCounter, I am afraid 1024 
size with 0.001(only one valid value) null_probability doesn't cover enough.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-08-26 Thread GitBox


jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477794871



##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template 
+using MinMaxResult = std::pair;
+
+template 
+static enable_if_integer> NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::max();
+  T max = std::numeric_limits::min();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::min(min, values[i]);
+max = std::max(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::min(min, values[i]);
+  max = std::max(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+static enable_if_floating_point> 
NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::infinity();
+  T max = -std::numeric_limits::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::fmin(min, values[i]);
+max = std::fmax(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::fmin(min, values[i]);
+  max = std::fmax(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as();
+
+  auto expected = NaiveMinMax(array);
+  const auto& out_min = checked_cast(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
   Return null for both the expected test and MinMax.
   
if (array.length() <= array.null_count()) {  // All null values
   return {static_cast(0), static_cast(0)};
 }





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-08-26 Thread GitBox


jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477781881



##
File path: cpp/src/arrow/compute/api_aggregate.h
##
@@ -130,23 +130,6 @@ Result MinMax(const Datum& value,
  const MinMaxOptions& options = MinMaxOptions::Defaults(),
  ExecContext* ctx = NULLPTR);
 
-/// \brief Calculate the min / max of a numeric array.
-///
-/// This function returns both the min and max as a collection. The resulting
-/// datum thus consists of two scalar datums: {Datum(min), Datum(max)}
-///
-/// \param[in] array input array
-/// \param[in] options see MinMaxOptions for more information
-/// \param[in] ctx the function execution context, optional
-/// \return resulting datum containing a {min, max} collection
-///
-/// \since 1.0.0
-/// \note API not yet finalized
-ARROW_EXPORT
-Result MinMax(const Array& array,

Review comment:
   Yes. There's no implementation to Array input of MinMax, it will hit a 
link error if call MinMax with Array data.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign

2020-08-26 Thread GitBox


jduo commented on a change in pull request #7994:
URL: https://github.com/apache/arrow/pull/7994#discussion_r477768470



##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/ServerInterceptorAdapter.java
##
@@ -87,10 +87,12 @@ public ServerInterceptorAdapter(List> 
factories) {
 // Use LinkedHashMap to preserve insertion order
 final Map, FlightServerMiddleware> 
middlewareMap = new LinkedHashMap<>();
 final MetadataAdapter headerAdapter = new MetadataAdapter(headers);
+Context currentContext = Context.current();
 for (final KeyFactory factory : factories) {
   final FlightServerMiddleware m;
   try {
 m = factory.factory.onCallStarted(info, headerAdapter);
+currentContext = m.onAuthenticationSuccess(currentContext);

Review comment:
   I added another input to onCallStarted() that the implementor can use to 
write context-specific variable values.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign

2020-08-26 Thread GitBox


jduo commented on a change in pull request #7994:
URL: https://github.com/apache/arrow/pull/7994#discussion_r477766787



##
File path: 
java/flight/flight-core/src/test/java/org/apache/arrow/flight/auth/TestBasicAuth.java
##
@@ -44,29 +44,38 @@
 import org.junit.Ignore;
 import org.junit.Test;
 
+import com.google.common.base.Strings;
 import com.google.common.collect.ImmutableList;
 
 public class TestBasicAuth {
 
   private static final String USERNAME = "flight";
   private static final String PASSWORD = "woohoo";
-  private static final byte[] VALID_TOKEN = 
"my_token".getBytes(StandardCharsets.UTF_8);
+  private static final String VALID_TOKEN = "my_token";
 
+  private FlightClient.Builder clientBuilder;
   private FlightClient client;
   private FlightServer server;
   private BufferAllocator allocator;
 
   @Test
   public void validAuth() {
-client.authenticateBasic(USERNAME, PASSWORD);
-
Assert.assertTrue(ImmutableList.copyOf(client.listFlights(Criteria.ALL)).size() 
== 0);
+try {
+  client = clientBuilder.callCredentials(new 
BasicAuthCallCredentials(USERNAME, PASSWORD)).build();
+  client.handshake();
+  
Assert.assertTrue(ImmutableList.copyOf(client.listFlights(Criteria.ALL)).size() 
== 0);

Review comment:
   Changed to assertTrue(isEmpty()). But let's focus on the design given 
this is a POC.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x

2020-08-26 Thread GitBox


kou commented on pull request #8047:
URL: https://github.com/apache/arrow/pull/8047#issuecomment-681174260


   This PR has been merged into master: 
https://github.com/apache/arrow/commit/92e01cc5ebe912f0ad1377eae765aa36478db52f
   
   We use our merge tool 
https://github.com/apache/arrow/blob/master/dev/merge_arrow_pr.py instead of 
using GitHub's merge UI to update corresponding JIRA issue too. We close PR but 
the change is merged.
   
   Now, you can use the CI job in https://github.com/apache/arrow/pull/8011 by 
rebasing on master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8058: ARROW-9854: [R] Support reading/writing data to/from S3

2020-08-26 Thread GitBox


github-actions[bot] commented on pull request #8058:
URL: https://github.com/apache/arrow/pull/8058#issuecomment-681140852


   https://issues.apache.org/jira/browse/ARROW-9854



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #8058: ARROW-9854: [R] Support reading/writing data to/from S3

2020-08-26 Thread GitBox


nealrichardson opened a new pull request #8058:
URL: https://github.com/apache/arrow/pull/8058


   - [x] read_parquet/feather/etc. from S3 (use FileSystem->OpenInputFile(path))
   - [x] write_$FORMAT via FileSystem->OpenOutputStream(path)
   - [x] write_dataset (done? at least via URI)
   - [ ] for linux, an argument to install_arrow to help, assuming you've 
installed aws-sdk-cpp already (turn on ARROW_S3, AWSSDK_SOURCE=SYSTEM)
   - [ ] testing with minio on CI
   - [x] set up a real test bucket and user for e2e testing (credentials 
available on request)
   - [ ] update docs, vignettes, news
   
   Out of the current scope:
   
   - [ ] download dataset, i.e. copy files/directory recursively (needs 
ARROW-9867, ARROW-9868)
   - [ ] friendlier methods for interacting with/viewing a filesystem (ls, 
mkdir, etc.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign

2020-08-26 Thread GitBox


lidavidm commented on a change in pull request #7994:
URL: https://github.com/apache/arrow/pull/7994#discussion_r477591584



##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/CallOptions.java
##
@@ -56,7 +56,7 @@ public static CallOption timeout(long duration, TimeUnit 
unit) {
   /**
* CallOptions specific to GRPC stubs.
*/
-  interface GrpcCallOption extends CallOption {
+  public interface GrpcCallOption extends CallOption {

Review comment:
   This can't be public in this module - it should have a public 
subinterface in flight-grpc.

##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/CallContext.java
##
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.flight;
+
+/**
+ * Tracks variables about the current request.
+ */
+public interface CallContext {

Review comment:
   Let's try not to rename existing classes unnecessarily and instead come 
up with new names. 

##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightRuntimeException.java
##
@@ -29,7 +29,7 @@
   /**
* Create a new exception from the given status.
*/
-  FlightRuntimeException(CallStatus status) {
+  public FlightRuntimeException(CallStatus status) {

Review comment:
   This shouldn't need to be public, a CallStatus has a toRuntimeException 
method already





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] vivkong commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x

2020-08-26 Thread GitBox


vivkong commented on pull request #8047:
URL: https://github.com/apache/arrow/pull/8047#issuecomment-681105501


   @kou Sorry I'm not familiar with the process, will this PR be merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #8047: ARROW-9844: [CI] Add Go build job on s390x

2020-08-26 Thread GitBox


kou closed pull request #8047:
URL: https://github.com/apache/arrow/pull/8047


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind

2020-08-26 Thread GitBox


kou commented on pull request #8049:
URL: https://github.com/apache/arrow/pull/8049#issuecomment-681096064


   I agree with you including upper case.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign

2020-08-26 Thread GitBox


jduo commented on a change in pull request #7994:
URL: https://github.com/apache/arrow/pull/7994#discussion_r477543928



##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightClient.java
##
@@ -84,18 +81,15 @@
* Create a Flight client from an allocator and a gRPC channel.
*/
   FlightClient(BufferAllocator incomingAllocator, ManagedChannel channel,
-  List middleware) {
+  List middleware, CallCredentials 
callCredentials) {
 this.allocator = incomingAllocator.newChildAllocator("flight-client", 0, 
Long.MAX_VALUE);
 this.channel = channel;
 
-final ClientInterceptor[] interceptors;
-interceptors = new ClientInterceptor[]{authInterceptor, new 
ClientInterceptorAdapter(middleware)};
-
 // Create a channel with interceptors pre-applied for DoGet and DoPut
-this.interceptedChannel = ClientInterceptors.intercept(channel, 
interceptors);
+this.interceptedChannel = ClientInterceptors.intercept(channel, new 
ClientInterceptorAdapter(middleware));
 
-blockingStub = FlightServiceGrpc.newBlockingStub(interceptedChannel);
-asyncStub = FlightServiceGrpc.newStub(interceptedChannel);
+blockingStub = 
FlightServiceGrpc.newBlockingStub(interceptedChannel).withCallCredentials(callCredentials);

Review comment:
   I've added a CredentialCallOptions class which creates and adds 
CallCredentials to the gRPC stub.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign

2020-08-26 Thread GitBox


jduo commented on a change in pull request #7994:
URL: https://github.com/apache/arrow/pull/7994#discussion_r477543461



##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightClient.java
##
@@ -156,23 +150,12 @@
   }
 
   /**
-   * Authenticates with a username and password.

Review comment:
   I renamed authenticate() to handshake() now, because authentication is 
now supplied as a CallOption. I brought back authenticateBasic() and also now 
have it return a CallOption containing the bearer token credentials if the 
server sent it back.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign

2020-08-26 Thread GitBox


jduo commented on a change in pull request #7994:
URL: https://github.com/apache/arrow/pull/7994#discussion_r477542825



##
File path: 
java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/ServerInterceptorAdapter.java
##
@@ -87,10 +87,12 @@ public ServerInterceptorAdapter(List> 
factories) {
 // Use LinkedHashMap to preserve insertion order
 final Map, FlightServerMiddleware> 
middlewareMap = new LinkedHashMap<>();
 final MetadataAdapter headerAdapter = new MetadataAdapter(headers);
+Context currentContext = Context.current();
 for (final KeyFactory factory : factories) {
   final FlightServerMiddleware m;
   try {
 m = factory.factory.onCallStarted(info, headerAdapter);
+currentContext = m.onAuthenticationSuccess(currentContext);

Review comment:
   I created a facade for this and supplied it as an input to onCallStarted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scober commented on a change in pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

2020-08-26 Thread GitBox


scober commented on a change in pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#discussion_r477475598



##
File path: cpp/src/parquet/metadata.h
##
@@ -210,6 +210,12 @@ class PARQUET_EXPORT RowGroupMetaData {
 
   /// \brief Total byte size of all the uncompressed column data in this row 
group.
   int64_t total_byte_size() const;
+
+  // The file_offset field that this method exposes is optional. This method

Review comment:
   Sure





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

2020-08-26 Thread GitBox


emkornfield commented on a change in pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#discussion_r477472781



##
File path: cpp/src/parquet/metadata.h
##
@@ -210,6 +210,12 @@ class PARQUET_EXPORT RowGroupMetaData {
 
   /// \brief Total byte size of all the uncompressed column data in this row 
group.
   int64_t total_byte_size() const;
+
+  // The file_offset field that this method exposes is optional. This method

Review comment:
   Please move this below \brief with a new line and use triple slash '///'





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scober commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

2020-08-26 Thread GitBox


scober commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681018008


   I updated the comments for the method declaration to address your 
documentation concerns. I did not change the default value of `file_offset` 
because I agree that 0 is wrong (though less obviously wrong than -1) and I did 
not want to chase down the unintended consequences of changing that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #8052:
URL: https://github.com/apache/arrow/pull/8052#discussion_r477293832



##
File path: cpp/src/arrow/c/abi.h
##
@@ -60,6 +60,31 @@ struct ArrowArray {
   void* private_data;
 };
 
+// EXPERIMENTAL
+struct ArrowArrayStream {
+  // Callback to get the stream type
+  // (will be the same for all arrays in the stream).
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out);
+  // Callback to get the next array
+  // (if no error and the array is released, the stream has ended)
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.

Review comment:
   I don't know how to phrase it: it returns value that are `errno` error 
codes (in case of error). A number of values are standard in C++: 
https://en.cppreference.com/w/cpp/error/errno_macros
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

2020-08-26 Thread GitBox


emkornfield commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681004709


   > When you say documentation I assume you mean a comment near the method 
definition?
   
   Yes the comment for the method definition
   
   > And it looks the default value of this field is 0?
   
   Yes, I'm not sure if it breaks anything to make this -1 or something 
obviously wrong (0 should be wrong because I think all parquet files need to 
start with 'PAR1'
   
   > And when you say the wording from the parquet specification do you mean 
the comment in the thrift definition?
   
   Yes:
   ```
   Byte offset from beginning of file to first page (data or dictionary) in 
this row group
   ```
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477425988



##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template 
+using MinMaxResult = std::pair;
+
+template 
+static enable_if_integer> NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::max();
+  T max = std::numeric_limits::min();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::min(min, values[i]);
+max = std::max(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::min(min, values[i]);
+  max = std::max(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+static enable_if_floating_point> 
NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::infinity();
+  T max = -std::numeric_limits::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::fmin(min, values[i]);
+max = std::fmax(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::fmin(min, values[i]);
+  max = std::fmax(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as();
+
+  auto expected = NaiveMinMax(array);
+  const auto& out_min = checked_cast(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);
+
+  const auto& out_max = checked_cast(*value.value[1]);
+  ASSERT_EQ(expected.second, out_max.value);
+}
+
+template 
+class TestRandomNumericMinMaxKernel : public ::testing::Test {};
+
+TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes);
+TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) {
+  auto rand = random::RandomArrayGenerator(0x8afc055);
+  // Test size up to 1<<13 (8192).

Review comment:
   Sounds a bit large. Why not stop at e.g. 1024?

##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template 
+using MinMaxResult = std::pair;
+
+template 
+static enable_if_integer> NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+return {static_cast(0), static_cast(0)};
+  }
+
+  T min = std::numeric_limits::max();
+  T max = std::numeric_limits::min();
+  if (array.null_count() != 0) {  // Some values are null
+internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+  array.length());
+for (int64_t i = 0; i < array.length(); i++) {
+  if (reader.IsSet()) {
+min = std::min(min, values[i]);
+max = std::max(max, values[i]);
+  }
+  reader.Next();
+}
+  } else {  // All true values
+for (int64_t i = 0; i < array.length(); i++) {
+  min = std::min(min, values[i]);
+  max = std::max(max, values[i]);
+}
+  }
+
+  return {min, max};
+}
+
+template 
+static enable_if_floating_point> 
NaiveMinMax(
+const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits::ArrayType;
+
+  const auto& array_numeric = 

[GitHub] [arrow] pitrou commented on a change in pull request #7973: ARROW-8493: [C++][Parquet] Start populating repeated ancestor defintion

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #7973:
URL: https://github.com/apache/arrow/pull/7973#discussion_r477352114



##
File path: cpp/src/parquet/level_conversion.h
##
@@ -20,10 +20,117 @@
 #include 
 
 #include "parquet/platform.h"
+#include "parquet/schema.h"
 
 namespace parquet {
 namespace internal {
 
+struct PARQUET_EXPORT LevelInfo {
+  LevelInfo()
+  : null_slot_usage(1), def_level(0), rep_level(0), 
repeated_ancestor_def_level(0) {}
+  LevelInfo(int32_t null_slots, int32_t definition_level, int32_t 
repitition_level,
+int32_t repeated_ancestor_definition_level)
+  : null_slot_usage(null_slots),
+def_level(definition_level),
+rep_level(repitition_level),
+repeated_ancestor_def_level(repeated_ancestor_definition_level) {}
+
+  bool operator==(const LevelInfo& b) const {
+return null_slot_usage == b.null_slot_usage && def_level == b.def_level &&
+   rep_level == b.rep_level &&
+   repeated_ancestor_def_level == b.repeated_ancestor_def_level;
+  }
+
+  // How many slots a null element consumes.
+  // This is only ever >1 for descendents of
+  // FixedSizeList.
+  int32_t null_slot_usage = 1;
+
+  // The definition level at which the value for the field
+  // is considered not null (definition levels greater than
+  // or equal to indicate this value indicate a not-null
+  // value for the field). For list fields definition levels
+  // greater then or equal to this field indicate a present
+  // , possibly null, element.
+  int16_t def_level = 0;
+
+  // The repetition level corresponding to this element
+  // or the closest repeated ancestor.  Any repetition
+  // level less than this indicates either a new list OR
+  // an empty list (which is determined in conjunction
+  // definition_level).
+  int16_t rep_level = 0;
+
+  // The definition level indicating the level at which the closest
+  // repeated ancestor was not empty.  This is used to discriminate
+  // between a value less than |definition_level|
+  // being null or excluded entirely.
+  // For instance if we have an arrow schema like:
+  // list(struct(f0: int)).  Then then there are the following
+  // definition levels:
+  // 0 = null list
+  // 1 = present but empty list.
+  // 2 = a null value in the list
+  // 3 = a non null struct but null integer.
+  // 4 = a present integer.

Review comment:
   +1, thanks for this example!

##
File path: cpp/src/parquet/level_conversion.h
##
@@ -20,10 +20,117 @@
 #include 
 
 #include "parquet/platform.h"
+#include "parquet/schema.h"
 
 namespace parquet {
 namespace internal {
 
+struct PARQUET_EXPORT LevelInfo {
+  LevelInfo()
+  : null_slot_usage(1), def_level(0), rep_level(0), 
repeated_ancestor_def_level(0) {}
+  LevelInfo(int32_t null_slots, int32_t definition_level, int32_t 
repitition_level,
+int32_t repeated_ancestor_definition_level)
+  : null_slot_usage(null_slots),
+def_level(definition_level),
+rep_level(repitition_level),
+repeated_ancestor_def_level(repeated_ancestor_definition_level) {}
+
+  bool operator==(const LevelInfo& b) const {
+return null_slot_usage == b.null_slot_usage && def_level == b.def_level &&
+   rep_level == b.rep_level &&
+   repeated_ancestor_def_level == b.repeated_ancestor_def_level;
+  }
+
+  // How many slots a null element consumes.
+  // This is only ever >1 for descendents of
+  // FixedSizeList.
+  int32_t null_slot_usage = 1;
+
+  // The definition level at which the value for the field
+  // is considered not null (definition levels greater than
+  // or equal to indicate this value indicate a not-null
+  // value for the field). For list fields definition levels
+  // greater then or equal to this field indicate a present
+  // , possibly null, element.
+  int16_t def_level = 0;
+
+  // The repetition level corresponding to this element
+  // or the closest repeated ancestor.  Any repetition
+  // level less than this indicates either a new list OR
+  // an empty list (which is determined in conjunction
+  // definition_level).
+  int16_t rep_level = 0;
+
+  // The definition level indicating the level at which the closest
+  // repeated ancestor was not empty.  This is used to discriminate

Review comment:
   Do you mean logical ancestor (in Arrow terms)? Or physical ancestor (in 
Parquet nesting)?

##
File path: cpp/src/parquet/level_conversion.h
##
@@ -20,10 +20,117 @@
 #include 
 
 #include "parquet/platform.h"
+#include "parquet/schema.h"
 
 namespace parquet {
 namespace internal {
 
+struct PARQUET_EXPORT LevelInfo {
+  LevelInfo()
+  : null_slot_usage(1), def_level(0), rep_level(0), 
repeated_ancestor_def_level(0) {}
+  LevelInfo(int32_t null_slots, int32_t definition_level, int32_t 
repitition_level,
+int32_t repeated_ancestor_definition_level)
+  : null_slot_usage(null_slots),
+def_level(definition_level),
+

[GitHub] [arrow] jorisvandenbossche commented on pull request #8055: ARROW-7226: [Python][Doc] Add note re: JSON format support

2020-08-26 Thread GitBox


jorisvandenbossche commented on pull request #8055:
URL: https://github.com/apache/arrow/pull/8055#issuecomment-680919060


   > Why would you expect something else?
   
   Because this is not a default "JSON" file, and many people will expect a 
"json reader" to support that. 
   I know it is already mentioned in the text, but given this expectation, I 
think it is fine to have an additional explicit note about it.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wqc200 closed pull request #8033: ARROW-9837: [Rust][DataFusion] Add provider for variable

2020-08-26 Thread GitBox


wqc200 closed pull request #8033:
URL: https://github.com/apache/arrow/pull/8033


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #7992:
URL: https://github.com/apache/arrow/pull/7992#discussion_r477341587



##
File path: cpp/src/arrow/ipc/metadata_internal.h
##
@@ -48,6 +48,7 @@ namespace flatbuf = org::apache::arrow::flatbuf;
 
 namespace ipc {
 
+class DictionaryFieldMapper;
 class DictionaryMemo;

Review comment:
   `DictionaryMemo` is still passed in some IPC reading APIs, I think. I 
don't know if those are meant to be public.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #7992:
URL: https://github.com/apache/arrow/pull/7992#discussion_r477340732



##
File path: cpp/src/arrow/ipc/dictionary.h
##
@@ -21,96 +21,145 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#include "arrow/memory_pool.h"
+#include "arrow/result.h"
 #include "arrow/status.h"
+#include "arrow/type_fwd.h"
 #include "arrow/util/macros.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
+namespace ipc {
+
+class FieldPosition {
+ public:
+  FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {}
+
+  FieldPosition child(int index) const { return {this, index}; }
+
+  std::vector path() const {
+std::vector path(depth_);
+const FieldPosition* cur = this;
+for (int i = depth_ - 1; i >= 0; --i) {
+  path[i] = cur->index_;
+  cur = cur->parent_;
+}
+return path;
+  }
+
+ protected:
+  FieldPosition(const FieldPosition* parent, int index)
+  : parent_(parent), index_(index), depth_(parent->depth_ + 1) {}
+
+  const FieldPosition* parent_;
+  int index_;
+  int depth_;
+};
 
-class Array;
-class DataType;
-class Field;
-class RecordBatch;
+/// \brief Map fields in a schema to dictionary ids
+///
+/// The mapping is structural, i.e. the field path (as a vector of indices)
+/// is associated to the dictionary id.

Review comment:
   Will add.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #7992:
URL: https://github.com/apache/arrow/pull/7992#discussion_r477341153



##
File path: cpp/src/arrow/ipc/dictionary.h
##
@@ -21,96 +21,145 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#include "arrow/memory_pool.h"
+#include "arrow/result.h"
 #include "arrow/status.h"
+#include "arrow/type_fwd.h"
 #include "arrow/util/macros.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
+namespace ipc {
+
+class FieldPosition {
+ public:
+  FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {}
+
+  FieldPosition child(int index) const { return {this, index}; }
+
+  std::vector path() const {
+std::vector path(depth_);
+const FieldPosition* cur = this;
+for (int i = depth_ - 1; i >= 0; --i) {
+  path[i] = cur->index_;
+  cur = cur->parent_;
+}
+return path;
+  }
+
+ protected:
+  FieldPosition(const FieldPosition* parent, int index)
+  : parent_(parent), index_(index), depth_(parent->depth_ + 1) {}
+
+  const FieldPosition* parent_;
+  int index_;
+  int depth_;
+};
 
-class Array;
-class DataType;
-class Field;
-class RecordBatch;
+/// \brief Map fields in a schema to dictionary ids
+///
+/// The mapping is structural, i.e. the field path (as a vector of indices)
+/// is associated to the dictionary id.
+class ARROW_EXPORT DictionaryFieldMapper {
+ public:
+  DictionaryFieldMapper();
+  explicit DictionaryFieldMapper(const Schema& schema);
+  ~DictionaryFieldMapper();
 
-namespace ipc {
+  Status AddSchemaFields(const Schema& schema);
+  Status AddField(int64_t id, std::vector field_path);
 
-/// \brief Memoization data structure for assigning id numbers to
-/// dictionaries and tracking their current state through possible
-/// deltas in an IPC stream
+  Result GetFieldId(std::vector field_path) const;
+
+  int num_fields() const;
+
+ private:
+  struct Impl;
+  std::unique_ptr impl_;
+};
+
+using DictionaryVector = std::vector>>;
+
+/// \brief Memoization data structure for reading dictionaries from IPC streams
+///
+/// This structure tracks the following associations:
+/// - field position (structural) -> dictionary id
+/// - dictionary id -> value type
+/// - dictionary id -> dictionary (value) data
+///
+/// Together, they allow resolving dictionary data when reading an IPC stream,
+/// using metadata recorded in the schema message and data recorded in the
+/// dictionary batch messages (see ResolveDictionaries).
+///
+/// This structure isn't useful for writing an IPC stream, where only
+/// DictionaryFieldMapper is necessary.
 class ARROW_EXPORT DictionaryMemo {
  public:
-  using DictionaryVector = std::vector>>;
-
   DictionaryMemo();
-  DictionaryMemo(DictionaryMemo&&) = default;
-  DictionaryMemo& operator=(DictionaryMemo&&) = default;
+  ~DictionaryMemo();
+
+  DictionaryFieldMapper& fields();

Review comment:
   Well, either this, or we duplicate the `DictionaryFieldMapper` APIs 
here. I have no particular preference.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #7992:
URL: https://github.com/apache/arrow/pull/7992#discussion_r477340451



##
File path: cpp/src/arrow/flight/internal.cc
##
@@ -465,11 +465,9 @@ Status FromProto(const pb::SchemaResult& pb_result, 
std::string* result) {
 }
 
 Status SchemaToString(const Schema& schema, std::string* out) {
-  // TODO(wesm): Do we care about better memory efficiency here?
   ipc::DictionaryMemo unused_dict_memo;

Review comment:
   Hmm... since there seemed to be nothing concerning about memory 
efficiency, I thought this TODO was obsolete. What did you have in mind when 
writing it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC

2020-08-26 Thread GitBox


wesm commented on a change in pull request #7992:
URL: https://github.com/apache/arrow/pull/7992#discussion_r474285549



##
File path: cpp/src/arrow/flight/internal.cc
##
@@ -465,11 +465,9 @@ Status FromProto(const pb::SchemaResult& pb_result, 
std::string* result) {
 }
 
 Status SchemaToString(const Schema& schema, std::string* out) {
-  // TODO(wesm): Do we care about better memory efficiency here?
   ipc::DictionaryMemo unused_dict_memo;

Review comment:
   This can be removed now?

##
File path: cpp/src/arrow/ipc/dictionary.h
##
@@ -21,96 +21,145 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#include "arrow/memory_pool.h"
+#include "arrow/result.h"
 #include "arrow/status.h"
+#include "arrow/type_fwd.h"
 #include "arrow/util/macros.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
+namespace ipc {
+
+class FieldPosition {
+ public:
+  FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {}
+
+  FieldPosition child(int index) const { return {this, index}; }
+
+  std::vector path() const {
+std::vector path(depth_);
+const FieldPosition* cur = this;
+for (int i = depth_ - 1; i >= 0; --i) {
+  path[i] = cur->index_;
+  cur = cur->parent_;
+}
+return path;
+  }
+
+ protected:
+  FieldPosition(const FieldPosition* parent, int index)
+  : parent_(parent), index_(index), depth_(parent->depth_ + 1) {}
+
+  const FieldPosition* parent_;
+  int index_;
+  int depth_;
+};
 
-class Array;
-class DataType;
-class Field;
-class RecordBatch;
+/// \brief Map fields in a schema to dictionary ids
+///
+/// The mapping is structural, i.e. the field path (as a vector of indices)
+/// is associated to the dictionary id.

Review comment:
   It would be helpful to state that a dictionary id can be associated with 
multiple field paths

##
File path: cpp/src/arrow/ipc/metadata_internal.h
##
@@ -48,6 +48,7 @@ namespace flatbuf = org::apache::arrow::flatbuf;
 
 namespace ipc {
 
+class DictionaryFieldMapper;
 class DictionaryMemo;

Review comment:
   Perhaps we should move these both to `ipc::internal`?

##
File path: cpp/src/arrow/ipc/dictionary.h
##
@@ -21,96 +21,145 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#include "arrow/memory_pool.h"
+#include "arrow/result.h"
 #include "arrow/status.h"
+#include "arrow/type_fwd.h"
 #include "arrow/util/macros.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
+namespace ipc {
+
+class FieldPosition {
+ public:
+  FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {}
+
+  FieldPosition child(int index) const { return {this, index}; }
+
+  std::vector path() const {
+std::vector path(depth_);
+const FieldPosition* cur = this;
+for (int i = depth_ - 1; i >= 0; --i) {
+  path[i] = cur->index_;
+  cur = cur->parent_;
+}
+return path;
+  }
+
+ protected:
+  FieldPosition(const FieldPosition* parent, int index)
+  : parent_(parent), index_(index), depth_(parent->depth_ + 1) {}
+
+  const FieldPosition* parent_;
+  int index_;
+  int depth_;
+};
 
-class Array;
-class DataType;
-class Field;
-class RecordBatch;
+/// \brief Map fields in a schema to dictionary ids
+///
+/// The mapping is structural, i.e. the field path (as a vector of indices)
+/// is associated to the dictionary id.
+class ARROW_EXPORT DictionaryFieldMapper {
+ public:
+  DictionaryFieldMapper();
+  explicit DictionaryFieldMapper(const Schema& schema);
+  ~DictionaryFieldMapper();
 
-namespace ipc {
+  Status AddSchemaFields(const Schema& schema);
+  Status AddField(int64_t id, std::vector field_path);
 
-/// \brief Memoization data structure for assigning id numbers to
-/// dictionaries and tracking their current state through possible
-/// deltas in an IPC stream
+  Result GetFieldId(std::vector field_path) const;
+
+  int num_fields() const;
+
+ private:
+  struct Impl;
+  std::unique_ptr impl_;
+};
+
+using DictionaryVector = std::vector>>;
+
+/// \brief Memoization data structure for reading dictionaries from IPC streams
+///
+/// This structure tracks the following associations:
+/// - field position (structural) -> dictionary id
+/// - dictionary id -> value type
+/// - dictionary id -> dictionary (value) data
+///
+/// Together, they allow resolving dictionary data when reading an IPC stream,
+/// using metadata recorded in the schema message and data recorded in the
+/// dictionary batch messages (see ResolveDictionaries).
+///
+/// This structure isn't useful for writing an IPC stream, where only
+/// DictionaryFieldMapper is necessary.
 class ARROW_EXPORT DictionaryMemo {
  public:
-  using DictionaryVector = std::vector>>;
-
   DictionaryMemo();
-  DictionaryMemo(DictionaryMemo&&) = default;
-  DictionaryMemo& operator=(DictionaryMemo&&) = default;
+  ~DictionaryMemo();
+
+  DictionaryFieldMapper& fields();

Review comment:
   Is this necessary? It might be clearer to have `DictionaryFieldMapper* 

[GitHub] [arrow] pitrou commented on pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC

2020-08-26 Thread GitBox


pitrou commented on pull request #7992:
URL: https://github.com/apache/arrow/pull/7992#issuecomment-680893599


   Rebased, fixed conflict.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #8043: IGNORE: [Rust] Add Cargo.lock

2020-08-26 Thread GitBox


andygrove closed pull request #8043:
URL: https://github.com/apache/arrow/pull/8043


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #8055: ARROW-7226: [Python][Doc] Add note re: JSON format support

2020-08-26 Thread GitBox


pitrou commented on pull request #8055:
URL: https://github.com/apache/arrow/pull/8055#issuecomment-680891968


   Is this useful? The paragraph above contains this sentence:
   > a JSON file consists of multiple JSON objects, one per line, representing 
individual data rows
   
   Why would you expect something else?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #8050: ARROW-9852: [C++] Validate dictionaries fully on IPC read

2020-08-26 Thread GitBox


pitrou commented on pull request #8050:
URL: https://github.com/apache/arrow/pull/8050#issuecomment-680887980


   I think I'll revisit this after #7992.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #8049:
URL: https://github.com/apache/arrow/pull/8049#discussion_r477302093



##
File path: cpp/src/arrow/util/decimal_test.cc
##
@@ -384,10 +389,10 @@ static const ToStringTestData kToStringTestData[] = {
 {-1234567890123456789LL, 25, "-1.234567890123456789E-7"},
 };
 
-class Decimal128ToStringTest : public 
::testing::TestWithParam {};
+class Decimal128ToStringTest : public 
::testing::TestWithParam {};
 
 TEST_P(Decimal128ToStringTest, ToString) {
-  const ToStringTestData& data = GetParam();
+  const ToStringTestParam& data = GetParam();

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind

2020-08-26 Thread GitBox


pitrou commented on pull request #8049:
URL: https://github.com/apache/arrow/pull/8049#issuecomment-680880304


   > We need to use lower case for ARROW_USER_SIMD_LEVEL
   
   Hmm, that's a pity. Can't we use the same convention everywhere?
   (I would probably favour uppercase, since those names are generally 
abbreviations)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI

2020-08-26 Thread GitBox


pitrou commented on pull request #8052:
URL: https://github.com/apache/arrow/pull/8052#issuecomment-680876865


   However, as said above, perhaps returning `-1` (rather than `0` for success 
or a positive `errno`-like value for errors) could mean an end of stream. 
Thoughts?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #8052:
URL: https://github.com/apache/arrow/pull/8052#discussion_r477295669



##
File path: cpp/src/arrow/c/abi.h
##
@@ -60,6 +60,31 @@ struct ArrowArray {
   void* private_data;
 };
 
+// EXPERIMENTAL
+struct ArrowArrayStream {
+  // Callback to get the stream type
+  // (will be the same for all arrays in the stream).
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out);
+  // Callback to get the next array
+  // (if no error and the array is released, the stream has ended)

Review comment:
   A nullptr cannot be returned. The callback returns an `int`. However, we 
could say that returning `-1` means end of stream.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #8052:
URL: https://github.com/apache/arrow/pull/8052#discussion_r477293832



##
File path: cpp/src/arrow/c/abi.h
##
@@ -60,6 +60,31 @@ struct ArrowArray {
   void* private_data;
 };
 
+// EXPERIMENTAL
+struct ArrowArrayStream {
+  // Callback to get the stream type
+  // (will be the same for all arrays in the stream).
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out);
+  // Callback to get the next array
+  // (if no error and the array is released, the stream has ended)
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.

Review comment:
   I don't how to phrase it: it returns value that are `errno` error codes 
(in case of error). A number of values are standard in C++: 
https://en.cppreference.com/w/cpp/error/errno_macros
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #8052:
URL: https://github.com/apache/arrow/pull/8052#discussion_r477294326



##
File path: cpp/src/arrow/c/abi.h
##
@@ -60,6 +60,31 @@ struct ArrowArray {
   void* private_data;
 };
 
+// EXPERIMENTAL
+struct ArrowArrayStream {
+  // Callback to get the stream type
+  // (will be the same for all arrays in the stream).
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out);
+  // Callback to get the next array
+  // (if no error and the array is released, the stream has ended)
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_next)(struct ArrowArrayStream*, struct ArrowArray* out);
+
+  // Callback to get optional detailed error information.
+  // This must only be called if the last stream operation failed
+  // with a non-0 return code.  The returned pointer is only valid until
+  // the next operation on this stream (including release).
+  // If unavailable, NULL is returned.
+  const char* (*get_last_error)(struct ArrowArrayStream*);
+
+  // Release callback: release the stream's own resources.
+  // Note that arrays returned by `get_next` must be individually released.

Review comment:
   Hmm... I'd say no.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] vivkong commented on a change in pull request #8047: ARROW-9844: [CI] Add Go build job on s390x

2020-08-26 Thread GitBox


vivkong commented on a change in pull request #8047:
URL: https://github.com/apache/arrow/pull/8047#discussion_r477266332



##
File path: .travis.yml
##
@@ -56,6 +56,14 @@ jobs:
 UBUNTU: "20.04"
 cares_SOURCE: "BUNDLED"
 gRPC_SOURCE: "BUNDLED"
+- name: "Go on s390x"
+  os: linux
+  arch: s390x
+  env:
+ARCH: s390x
+ARROW_CI_MODULES: "GO"
+DOCKER_IMAGE_ID: debian-go
+GO: "1.12"

Review comment:
   Thanks for your feedback. I've removed the line.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI

2020-08-26 Thread GitBox


pitrou commented on pull request #8052:
URL: https://github.com/apache/arrow/pull/8052#issuecomment-680762232


   It's possible to have some chunks with 0 length in a stream, yes. I don't 
see any reason to forbid it in this API (and such corner cases are often 
annoying to deal with).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #8036: ARROW-9811: [C++] Unchecked floating point division by 0 should succeed

2020-08-26 Thread GitBox


pitrou commented on a change in pull request #8036:
URL: https://github.com/apache/arrow/pull/8036#discussion_r477141994



##
File path: cpp/src/arrow/compare.cc
##
@@ -875,9 +879,26 @@ class ScalarEqualsVisitor {
 return Status::OK();
   }
 
+  template 
+  typename std::enable_if::value ||
+  std::is_base_of::value,
+  Status>::type
+  Visit(const T& left_) {
+const auto& right = checked_cast(right_);
+if (options_.nans_equal()) {
+  result_ = right.value == left_.value ||
+(std::isnan(right.value) && std::isnan(left_.value));

Review comment:
   `-NAN` isn't really a thing. As per IEEE, every NaN is different and 
unequal (even to itself). Here we treat them as all equal, because that's 
vastly more convenient for testing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] thamht4190 commented on a change in pull request #8023: ARROW-9318: [C++] Parquet encryption key management

2020-08-26 Thread GitBox


thamht4190 commented on a change in pull request #8023:
URL: https://github.com/apache/arrow/pull/8023#discussion_r477067068



##
File path: cpp/src/parquet/encryption_internal.h
##
@@ -45,6 +45,9 @@ constexpr int8_t kOffsetIndex = 7;
 /// Performs AES encryption operations with GCM or CTR ciphers.
 class AesEncryptor {
  public:
+  /// Can serve one key length only. Possible values: 16, 24, 32 bytes.
+  explicit AesEncryptor(ParquetCipher::type alg_id, int key_len, bool 
metadata);

Review comment:
   @emkornfield I want to create a local variable of AesEncryptor, without 
using smart pointers, so it will store in stack rather than in heap.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform

2020-08-26 Thread GitBox


github-actions[bot] commented on pull request #8057:
URL: https://github.com/apache/arrow/pull/8057#issuecomment-680687276


   https://issues.apache.org/jira/browse/ARROW-9862



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk opened a new pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform

2020-08-26 Thread GitBox


kiszk opened a new pull request #8057:
URL: https://github.com/apache/arrow/pull/8057


   This PR enables `UnsafeDirectLittleEndian` class by removing to throw an 
exception on a big-endian platform. This is because this class originally 
supports primitive data types (up to 64-bit) in a native-endianness.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

2020-08-26 Thread GitBox


github-actions[bot] commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-680684161


   https://issues.apache.org/jira/browse/ARROW-9861



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk opened a new pull request #8056: ARROW-9861: [java] Support big-endian in DecimalVector

2020-08-26 Thread GitBox


kiszk opened a new pull request #8056:
URL: https://github.com/apache/arrow/pull/8056


   This PR fixes failures in DecimalVectorTest on a big-endian platform



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org