[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-06 Thread GitBox


kiszk commented on a change in pull request #7555:
URL: https://github.com/apache/arrow/pull/7555#discussion_r450040328



##
File path: cpp/src/arrow/ipc/test_common.cc
##
@@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool 
include_nulls, MemoryPool* pool
   return Status::OK();
 }
 
+template 
+static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* 
pool,
+  std::shared_ptr* out, uint32_t seed) {
+  random::RandomArrayGenerator rand(seed);
+  const double null_probability = include_nulls ? 0.5 : 0.0;
+
+  *out = rand.Numeric(length, 0, 1000, null_probability);
+
+  return Status::OK();
+}
+
+template <>

Review comment:
   specialize for 8-bit data. This is because range(0, 1000) is too big for 
8-bit data. Win64 causes hang-up without this specialization.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-06 Thread GitBox


liyafan82 commented on a change in pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#discussion_r450041982



##
File path: 
java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import org.apache.arrow.memory.util.MemoryUtil;
+
+/**
+ * The default Allocation Manager Factory for a module.
+ *
+ * This will be split out to the arrow-memory-netty/arrow-memory-unsafe modules
+ * as part of ARROW-8230. This is currently a placeholder which defaults to 
Netty.
+ *
+ */
+public class DefaultAllocationManagerFactory implements 
AllocationManager.Factory {

Review comment:
   We have 3 classes in the code base named 
"DefaultAllocationManagerFactory". Do we need all of them?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-06 Thread GitBox


rymurr commented on a change in pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#discussion_r450071734



##
File path: 
java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import org.apache.arrow.memory.util.MemoryUtil;
+
+/**
+ * The default Allocation Manager Factory for a module.
+ *
+ * This will be split out to the arrow-memory-netty/arrow-memory-unsafe modules
+ * as part of ARROW-8230. This is currently a placeholder which defaults to 
Netty.
+ *
+ */
+public class DefaultAllocationManagerFactory implements 
AllocationManager.Factory {

Review comment:
   Hey @liyafan82, yes i think we do:
   
   1) lives in the netty module.
   2) lives in the unsafe module
   3) lives in the tests portion of the core memory module.
   
   On startup we first check if a specific memory allocator has been requested. 
If yes we instantiate that. If no specific impl has been requested then we 
search the classpath for a `DefaultAllocationManagerFactory`. If we find one 
then we instantiate it, else we throw a Runtime exception (can't continue 
without an allocation manager).
   
   So each concrete impl of `AllocationManager` requires a factory to create it 
and we have a trivial allocation manager in `arrow-memory-core` tests to help 
the core tests finish.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7653: ARROW-9343: [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7653:
URL: https://github.com/apache/arrow/pull/7653#issuecomment-654606536


   https://issues.apache.org/jira/browse/ARROW-9343



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] projjal opened a new pull request #7653: ARROW-9343: [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces

2020-07-06 Thread GitBox


projjal opened a new pull request #7653:
URL: https://github.com/apache/arrow/pull/7653


   Also refactored the code to remove the parse_float helper function from 
context. This was done earlier to save the constructor cost when the 
stringconverter object from arrow/util/parsing was needed to be constructed 
first. Now that it has changed to parametric functions, its not needed to keep 
them in the executioncontext.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#issuecomment-654234467


   @pitrou added the C++ tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7176: ARROW-8796: [Rust] feat: Allow writers to use Vec

2020-07-06 Thread GitBox


andygrove commented on pull request #7176:
URL: https://github.com/apache/arrow/pull/7176#issuecomment-654252387


   @sunchao Did you have a chance to look at the code sample provided?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


pitrou commented on pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#issuecomment-654237098


   Looks like the PR needs rebasing and fixing for the latest union changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7591: ARROW-8535: [Rust] Specify arrow-flight version

2020-07-06 Thread GitBox


andygrove closed pull request #7591:
URL: https://github.com/apache/arrow/pull/7591


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

2020-07-06 Thread GitBox


andygrove commented on pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#issuecomment-654248872


   I'm going to take another run at this, with smaller changes in new PRs. 
Thanks for the reviews so far.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

2020-07-06 Thread GitBox


andygrove closed pull request #7253:
URL: https://github.com/apache/arrow/pull/7253


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn opened a new pull request #7643: ARROW-9331: [C++] Improve the conversion performance from Tensor to SparseCOOTensor

2020-07-06 Thread GitBox


mrkn opened a new pull request #7643:
URL: https://github.com/apache/arrow/pull/7643


   In this pull-request, the slowing down of the conversion introduced in #7539 
is canceled, and the conversion speed is improved than before #7539 in some 
cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7620: ARROW-9013: [C++] Validate CMake options

2020-07-06 Thread GitBox


pitrou commented on pull request #7620:
URL: https://github.com/apache/arrow/pull/7620#issuecomment-654273536


   Thanks for the review @kou . I believe I've addressed all of your comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] projjal opened a new pull request #7642: ARROW-9329: [C++][Gandiva] Implement castTimestampToDate function in gandiva

2020-07-06 Thread GitBox


projjal opened a new pull request #7642:
URL: https://github.com/apache/arrow/pull/7642


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


jorisvandenbossche commented on a change in pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#discussion_r450201426



##
File path: cpp/src/arrow/python/datetime.cc
##
@@ -262,6 +302,42 @@ int64_t PyDate_to_days(PyDateTime_Date* pydate) {
 PyDateTime_GET_DAY(pydate));
 }
 
+// GIL must be held when calling this function.
+Status StringToTzinfo(const std::string& tz, PyObject** tzinfo) {

Review comment:
   Looks fine!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7644: ARROW-9330: [C++] Fix crash and undefined behaviour on corrupt IPC input

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7644:
URL: https://github.com/apache/arrow/pull/7644#issuecomment-654284383


   https://issues.apache.org/jira/browse/ARROW-9330



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7646: ARROW-7237: [C++] Use Result in arrow/json APIs

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7646:
URL: https://github.com/apache/arrow/pull/7646#issuecomment-654370419


   https://issues.apache.org/jira/browse/ARROW-7237



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450383737



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   We can ensure to have an explicit type to the value scalar from 
`array.GetScalar()` but not when a null union scalar is directly constructed, 
so `MakeNullScalar(union_type)` won't know which type to choose for the 
underlying value scalar.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7647: ARROW-9337: [R] On C++ library build failure, give an unambiguous message

2020-07-06 Thread GitBox


nealrichardson opened a new pull request #7647:
URL: https://github.com/apache/arrow/pull/7647


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7647: ARROW-9337: [R] On C++ library build failure, give an unambiguous message

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7647:
URL: https://github.com/apache/arrow/pull/7647#issuecomment-654385353


   https://issues.apache.org/jira/browse/ARROW-9337



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7631:
URL: https://github.com/apache/arrow/pull/7631#discussion_r450396810



##
File path: python/pyarrow/tests/test_dataset.py
##
@@ -612,6 +613,83 @@ def test_make_fragment(multisourcefs):
 assert row_group_fragment.row_groups == [ds.RowGroupInfo(0)]
 
 
+def test_make_csv_fragment_from_buffer():
+content = textwrap.dedent("""
+alpha,num,animal
+a,12,dog
+b,11,cat
+c,10,rabbit
+""")
+buffer = pa.py_buffer(content.encode('utf-8'))
+
+csv_format = ds.CsvFileFormat()
+fragment = csv_format.make_fragment(buffer)
+
+expected = pa.table([['a', 'b', 'c'],
+ [12, 11, 10],
+ ['dog', 'cat', 'rabbit']],
+names=['alpha', 'num', 'animal'])
+assert fragment.to_table().equals(expected)
+
+pickled = pickle.loads(pickle.dumps(fragment))
+assert pickled.to_table().equals(fragment.to_table())
+
+
+@pytest.mark.parquet
+def test_make_parquet_fragment_from_buffer():
+import pyarrow.parquet as pq
+
+cases = [
+(
+pa.table(
+[
+['a', 'b', 'c'],
+[12, 11, 10],
+['dog', 'cat', 'rabbit']

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


BryanCutler commented on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-654366148


   Looks like the Spark test fails with
   ```
  File "/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 
593, in f
   "{} != {}".format(expected_key[i][1], window_range)
   AssertionError: {'start': datetime.datetime(2018, 3, 10, 0, 0), 'end': 
datetime.datetime(2018, 3, 15, 0, 0)} != {'start': datetime.datetime(2018, 3, 
10, 0, 0, tzinfo=), 'end': datetime.datetime(2018, 3, 
15, 0, 0, tzinfo=)}
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler edited a comment on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


BryanCutler edited a comment on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-654366148


   Looks like the Spark test fails with
   ```
  File "/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 
593, in f
   "{} != {}".format(expected_key[i][1], window_range)
   AssertionError: 
   {'start': datetime.datetime(2018, 3, 10, 0, 0), 'end': 
datetime.datetime(2018, 3, 15, 0, 0)} != 
   {'start': datetime.datetime(2018, 3, 10, 0, 0, tzinfo=), 'end': datetime.datetime(2018, 3, 15, 0, 0, tzinfo=)}
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-06 Thread GitBox


jorisvandenbossche commented on a change in pull request #7631:
URL: https://github.com/apache/arrow/pull/7631#discussion_r450390113



##
File path: python/pyarrow/tests/test_dataset.py
##
@@ -612,6 +613,83 @@ def test_make_fragment(multisourcefs):
 assert row_group_fragment.row_groups == [ds.RowGroupInfo(0)]
 
 
+def test_make_csv_fragment_from_buffer():
+content = textwrap.dedent("""
+alpha,num,animal
+a,12,dog
+b,11,cat
+c,10,rabbit
+""")
+buffer = pa.py_buffer(content.encode('utf-8'))
+
+csv_format = ds.CsvFileFormat()
+fragment = csv_format.make_fragment(buffer)
+
+expected = pa.table([['a', 'b', 'c'],
+ [12, 11, 10],
+ ['dog', 'cat', 'rabbit']],
+names=['alpha', 'num', 'animal'])
+assert fragment.to_table().equals(expected)
+
+pickled = pickle.loads(pickle.dumps(fragment))
+assert pickled.to_table().equals(fragment.to_table())
+
+
+@pytest.mark.parquet
+def test_make_parquet_fragment_from_buffer():
+import pyarrow.parquet as pq
+
+cases = [
+(
+pa.table(
+[
+['a', 'b', 'c'],
+[12, 11, 10],
+['dog', 'cat', 'rabbit']

Review comment:
   you could reduce the used vertical space here a bit by only defining the 
list of arrays here, and do `table = pa.table(arrays, names=['alpha', 'num', 
'animal'])` in the loop





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7646: ARROW-7237: [C++] Use Result in arrow/json APIs

2020-07-06 Thread GitBox


pitrou opened a new pull request #7646:
URL: https://github.com/apache/arrow/pull/7646


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


pitrou commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450371161



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   It the value scalar is not a nullptr, then it would seem better, unless 
that's complicated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


jorisvandenbossche commented on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-654377721


   Which I think is as expected? (at least if the data in the test are 
tz-aware) And thus the test can be updated on the spark side? 
   Since we actually change to now return datetimes with tzinfo set, instead of 
losing the timezone.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-06 Thread GitBox


kiszk commented on a change in pull request #7555:
URL: https://github.com/apache/arrow/pull/7555#discussion_r450287754



##
File path: cpp/src/arrow/ipc/test_common.cc
##
@@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool 
include_nulls, MemoryPool* pool
   return Status::OK();
 }
 
+template 
+static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* 
pool,
+  std::shared_ptr* out, uint32_t seed) {
+  random::RandomArrayGenerator rand(seed);
+  const double null_probability = include_nulls ? 0.5 : 0.0;
+
+  *out = rand.Numeric(length, 0, 1000, null_probability);
+
+  return Status::OK();
+}
+
+template <>
+Status MakeRandomArray(int64_t length, bool include_nulls, 
MemoryPool* pool,

Review comment:
   To make this static cause the compilation error. I will put them into 
the anonymous namespace.
   
   ```
   /home/ishizaki/Arrow/arrow/cpp/src/arrow/ipc/test_common.cc:94:1: error: 
explicit template specialization cannot have a storage class
static Status MakeRandomArray(int64_t length, bool include_nulls, 
MemoryPool* pool,
^~
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450317608



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   I think `GetScalar()` should return a Scalar with the same type as the 
Array. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#issuecomment-654308059


   @pitrou updated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-06 Thread GitBox


kiszk commented on a change in pull request #7555:
URL: https://github.com/apache/arrow/pull/7555#discussion_r450302190



##
File path: cpp/src/arrow/ipc/test_common.cc
##
@@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool 
include_nulls, MemoryPool* pool
   return Status::OK();
 }
 
+template 
+static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* 
pool,
+  std::shared_ptr* out, uint32_t seed) {
+  random::RandomArrayGenerator rand(seed);
+  const double null_probability = include_nulls ? 0.5 : 0.0;
+
+  *out = rand.Numeric(length, 0, 1000, null_probability);
+
+  return Status::OK();
+}
+
+template <>
+Status MakeRandomArray(int64_t length, bool include_nulls, 
MemoryPool* pool,

Review comment:
   I will just remove `static` like other MakeRandom...Array.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7645:
URL: https://github.com/apache/arrow/pull/7645#issuecomment-654314571


   https://issues.apache.org/jira/browse/ARROW-8374



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


pitrou commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450314430



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   Hmm... interesting. Shouldn't it be `MakeNullScalar(utf8())`?

##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -80,6 +84,15 @@ TYPED_TEST(TestNumericScalar, Basics) {
 
   auto dyn_null_value = MakeNullScalar(expected_type);
   ASSERT_EQ(*null_value, *dyn_null_value);
+
+  // test Array.GetScalar
+  auto arr = ArrayFromJSON(expected_type, "[null, 1, 2]");
+  ASSERT_OK_AND_ASSIGN(auto null, arr->GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto one, arr->GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto two, arr->GetScalar(2));
+  ASSERT_TRUE(null->Equals(*null_value));
+  ASSERT_TRUE(one->Equals(ScalarType(1)));
+  ASSERT_TRUE(two->Equals(ScalarType(2)));

Review comment:
   You never check that `Equals` returns false?

##
File path: cpp/src/arrow/scalar.cc
##
@@ -185,6 +192,35 @@ 
DictionaryScalar::DictionaryScalar(std::shared_ptr type)
 0)
 .ValueOrDie()} {}
 
+Result> DictionaryScalar::GetEncodedValue() const {
+  const auto& dict_type = checked_cast(*type);
+
+  if (!is_valid) {
+return MakeNullScalar(dict_type.value_type());
+  }
+
+  int64_t index_value = 0;
+  switch (dict_type.index_type()->id()) {
+case Type::INT8:
+  index_value = checked_cast(*value.index).value;
+  break;
+case Type::INT16:
+  index_value = checked_cast(*value.index).value;
+  break;
+case Type::INT32:
+  index_value = checked_cast(*value.index).value;
+  break;
+case Type::INT64:
+  index_value = checked_cast(*value.index).value;
+  break;
+default:
+  return Status::Invalid("Not implemented dictionary index type");

Review comment:
   `Status::TypeError`?

##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, 

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450327095



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   We have the value type in `UnionScalar.value.type`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7620: ARROW-9013: [C++] Validate CMake options

2020-07-06 Thread GitBox


pitrou commented on pull request #7620:
URL: https://github.com/apache/arrow/pull/7620#issuecomment-654331202


   The "AMD64 MacOS 10.15 Python 3.7" CI failure looks unrelated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-06 Thread GitBox


pitrou commented on a change in pull request #7555:
URL: https://github.com/apache/arrow/pull/7555#discussion_r450267748



##
File path: cpp/src/arrow/testing/random.h
##
@@ -140,6 +140,17 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator {
   std::shared_ptr Int64(int64_t size, int64_t min, int64_t max,
double null_probability = 0);
 
+  /// \brief Generates a random HalfFloatArray
+  ///
+  /// \param[in] size the size of the array to generate
+  /// \param[in] min the lower bound of the uniform distribution
+  /// \param[in] max the upper bound of the uniform distribution

Review comment:
   Hmm... this will not really be a *uniform* float16 distribution if 
you're selecting random values for the underlying int16 representation, right?

##
File path: cpp/src/arrow/ipc/test_common.cc
##
@@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool 
include_nulls, MemoryPool* pool
   return Status::OK();
 }
 
+template 
+static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* 
pool,
+  std::shared_ptr* out, uint32_t seed) {
+  random::RandomArrayGenerator rand(seed);
+  const double null_probability = include_nulls ? 0.5 : 0.0;
+
+  *out = rand.Numeric(length, 0, 1000, null_probability);
+
+  return Status::OK();
+}
+
+template <>
+Status MakeRandomArray(int64_t length, bool include_nulls, 
MemoryPool* pool,

Review comment:
   Make those static as well? (or put them in the anonymous namespace)

##
File path: cpp/src/arrow/ipc/feather_test.cc
##
@@ -294,6 +325,26 @@ INSTANTIATE_TEST_SUITE_P(
   TestParam(kFeatherV2Version, Compression::LZ4_FRAME),
   TestParam(kFeatherV2Version, Compression::ZSTD)));
 
+#define BATCH_CASES()  
  \

Review comment:
   `using kBatchCases = ::testing::Values(...)` perhaps?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-06 Thread GitBox


kiszk commented on a change in pull request #7555:
URL: https://github.com/apache/arrow/pull/7555#discussion_r450283575



##
File path: cpp/src/arrow/testing/random.h
##
@@ -140,6 +140,17 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator {
   std::shared_ptr Int64(int64_t size, int64_t min, int64_t max,
double null_probability = 0);
 
+  /// \brief Generates a random HalfFloatArray
+  ///
+  /// \param[in] size the size of the array to generate
+  /// \param[in] min the lower bound of the uniform distribution
+  /// \param[in] max the upper bound of the uniform distribution

Review comment:
   Good point. It will not be uniform distribution for float16
   Should I only update the comment? Or, should I create uniform distribution 
float16 random while the current std does not support float16?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois opened a new pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-06 Thread GitBox


romainfrancois opened a new pull request #7645:
URL: https://github.com/apache/arrow/pull/7645


   This needs some testing: 
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   f1 <- factor(c("a"), levels = c("a", "b"))
   f2 <- factor(c("c"), levels = c("c", "d"))
   
   ca <- ChunkedArray$create(f1, f2)
   ca$as_vector()
   #> [1] a c
   #> Levels: a b c d
   ca$type
   #> DictionaryType
   #> dictionary
   
   tab <- Table$create(
 record_batch(f = f1), 
 record_batch(f = f2)
   )
   tab
   #> Table
   #> 2 rows x 1 columns
   #> $f >
   df <- as.data.frame(tab)
   df
   #> # A tibble: 2 x 1
   #>   f
   #>   
   #> 1 a
   #> 2 c
   df$f
   #> [1] a c
   #> Levels: a b c d
   ```
   
   Created on 2020-07-06 by the [reprex 
package](https://reprex.tidyverse.org) (v0.3.0.9001)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


pitrou commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450319790



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   Ah, I see. Yes, you're probably right. A pity we lose information about 
the value type, then.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


pitrou commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450328412



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   Ah, perhaps check that in the test?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-06 Thread GitBox


jorisvandenbossche commented on pull request #7631:
URL: https://github.com/apache/arrow/pull/7631#issuecomment-654334653


   I opened https://issues.apache.org/jira/browse/ARROW-9332 for the parquet 
statistics



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-06 Thread GitBox


rok commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r450340186



##
File path: python/pyarrow/tensor.pxi
##
@@ -199,7 +202,13 @@ shape: {0.shape}""".format(self)
 for x in dim_names:
 c_dim_names.push_back(tobytes(x))
 
-coords = np.vstack([obj.row, obj.col]).T
+row = obj.row
+col = obj.col
+if obj.has_canonical_format:
+order = np.lexsort((col, row))  # row-major order

Review comment:
   So column-major is scipy-canonical and row-major is tensorflow canonical?
   Would we then not rather just introduce a new property e.g.: 
`has_duplicates`? I would find that less complex.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-06 Thread GitBox


rok commented on a change in pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#discussion_r450325016



##
File path: python/pyarrow/tensor.pxi
##
@@ -270,8 +279,10 @@ shape: {0.shape}""".format(self)
   _data, _coords))
 data = PyObject_to_object(out_data)
 coords = PyObject_to_object(out_coords)
-result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])),
-shape=self.shape)
+row, col = coords[:, 0], coords[:, 1]
+result = coo_matrix((data[:, 0], (row, col)), shape=self.shape)
+if self.has_canonical_format:
+result.sum_duplicates()

Review comment:
   @mrkn - wouldn't is_canonical guarantee no duplicates if indices were 
row- or column-ordered?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450332885



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   Adding.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7643: ARROW-9331: [C++] Improve the performance of Tensor-to-SparseTensor conversion

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7643:
URL: https://github.com/apache/arrow/pull/7643#issuecomment-654284382


   https://issues.apache.org/jira/browse/ARROW-9331



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-06 Thread GitBox


pitrou commented on a change in pull request #7555:
URL: https://github.com/apache/arrow/pull/7555#discussion_r450285869



##
File path: cpp/src/arrow/testing/random.h
##
@@ -140,6 +140,17 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator {
   std::shared_ptr Int64(int64_t size, int64_t min, int64_t max,
double null_probability = 0);
 
+  /// \brief Generates a random HalfFloatArray
+  ///
+  /// \param[in] size the size of the array to generate
+  /// \param[in] min the lower bound of the uniform distribution
+  /// \param[in] max the upper bound of the uniform distribution

Review comment:
   Updating the comment is enough. As long as the actual values are not 
processed, it doesn't really matter.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-654294358


   Revision: a550a761781e8a7e1ebe163642a1c6fae98dad52
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-380](https://github.com/ursa-labs/crossbow/branches/all?query=actions-380)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.7-spark-master|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-380-github-test-conda-python-3.7-spark-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-380-github-test-conda-python-3.7-spark-master)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs edited a comment on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs edited a comment on pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#issuecomment-654308059


   @pitrou updated. I assume now we propagate the null values from the selected 
child array, please double check the union tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450317608



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   I think `GetScalar()` should return a Scalar with the same type as the 
Array, so in case of unions with the union type rather than the type selected 
by the union type id (the union scalar's value is another scalar with the type 
id).
   
   I don't have a strong opinion on this though.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450318972



##
File path: cpp/src/arrow/scalar.cc
##
@@ -185,6 +192,35 @@ 
DictionaryScalar::DictionaryScalar(std::shared_ptr type)
 0)
 .ValueOrDie()} {}
 
+Result> DictionaryScalar::GetEncodedValue() const {
+  const auto& dict_type = checked_cast(*type);
+
+  if (!is_valid) {
+return MakeNullScalar(dict_type.value_type());
+  }
+
+  int64_t index_value = 0;
+  switch (dict_type.index_type()->id()) {
+case Type::INT8:
+  index_value = checked_cast(*value.index).value;
+  break;
+case Type::INT16:
+  index_value = checked_cast(*value.index).value;
+  break;
+case Type::INT32:
+  index_value = checked_cast(*value.index).value;
+  break;
+case Type::INT64:
+  index_value = checked_cast(*value.index).value;
+  break;
+default:
+  return Status::Invalid("Not implemented dictionary index type");

Review comment:
   Updating.

##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -80,6 +84,15 @@ TYPED_TEST(TestNumericScalar, Basics) {
 
   auto dyn_null_value = MakeNullScalar(expected_type);
   ASSERT_EQ(*null_value, *dyn_null_value);
+
+  // test Array.GetScalar
+  auto arr = ArrayFromJSON(expected_type, "[null, 1, 2]");
+  ASSERT_OK_AND_ASSIGN(auto null, arr->GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto one, arr->GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto two, arr->GetScalar(2));
+  ASSERT_TRUE(null->Equals(*null_value));
+  ASSERT_TRUE(one->Equals(ScalarType(1)));
+  ASSERT_TRUE(two->Equals(ScalarType(2)));

Review comment:
   Not yet, adding check for it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r450344074



##
File path: cpp/src/arrow/scalar_test.cc
##
@@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) {
 
   ASSERT_RAISES(Invalid, abc.field(5).status());
   ASSERT_RAISES(Invalid, abc.field("c").status());
+
+  ASSERT_OK_AND_ASSIGN(auto d, abc.field("d"));
+  ASSERT_TRUE(d->Equals(MakeNullScalar(int64(;
+}
+
+TEST(TestDictionaryScalar, Basics) {
+  auto ty = dictionary(int8(), utf8());
+  auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]");
+
+  DictionaryScalar::ValueType alpha;
+  ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0));
+  alpha.dictionary = dict;
+
+  DictionaryScalar::ValueType gamma;
+  ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2));
+  gamma.dictionary = dict;
+
+  auto scalar_null = MakeNullScalar(ty);
+  auto scalar_alpha = DictionaryScalar(alpha, ty);
+  auto scalar_gamma = DictionaryScalar(gamma, ty);
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_null,
+  checked_cast(*scalar_null).GetEncodedValue());
+  ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(;
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_alpha,
+  checked_cast(scalar_alpha).GetEncodedValue());
+  ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha")));
+
+  ASSERT_OK_AND_ASSIGN(
+  auto encoded_gamma,
+  checked_cast(scalar_gamma).GetEncodedValue());
+  ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma")));
+
+  // test Array.GetScalar
+  DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict);
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3));
+
+  ASSERT_TRUE(first->Equals(scalar_gamma));
+  ASSERT_TRUE(second->Equals(scalar_alpha));
+  ASSERT_TRUE(last->Equals(scalar_null));
+}
+
+TEST(TestSparseUnionScalar, Basics) {
+  auto ty = sparse_union({field("string", utf8()), field("number", uint64())});
+
+  auto alpha = MakeScalar("alpha");
+  auto beta = MakeScalar("beta");
+  ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2));
+
+  auto scalar_alpha = SparseUnionScalar(alpha, ty);
+  auto scalar_beta = SparseUnionScalar(beta, ty);
+  auto scalar_two = SparseUnionScalar(two, ty);
+
+  // test Array.GetScalar
+  std::vector> children{
+  ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"),
+  ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")};
+
+  auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]");
+  SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]);
+  ASSERT_OK(arr.ValidateFull());
+
+  ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0));
+  ASSERT_TRUE(first->Equals(scalar_alpha));
+
+  ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1));
+  ASSERT_TRUE(second->Equals(scalar_two));
+
+  ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2));
+  ASSERT_TRUE(third->Equals(scalar_beta));
+
+  ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3));
+  ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty)));

Review comment:
   So for UnionScalar we have two `is_valid` flags, une for the union 
scalar and one for the underlying value scalar. Currently `MakeNullScalar()` 
sets the validity flag for the union scalar and leaves the value scalar 
uninitialized. 
   
   What do you think, shall we initialize the value scalar as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-06 Thread GitBox


rok commented on pull request #7477:
URL: https://github.com/apache/arrow/pull/7477#issuecomment-654347134


   @mrkn - sorry I couldn't spend time on this earlier. I think pyarrow part is 
ok (except for the already going discussions).
   
   Would using the canonical flag make value assignment in 
[`MakeTensorFromSparseCOOTensor`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/tensor/coo_converter.cc#L224)
 faster?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-06 Thread GitBox


kszucs commented on a change in pull request #7631:
URL: https://github.com/apache/arrow/pull/7631#discussion_r450361126



##
File path: python/pyarrow/tests/test_dataset.py
##
@@ -635,6 +635,37 @@ def test_make_fragment_from_buffer():
 assert pickled.to_table().equals(fragment.to_table())
 
 
+@pytest.mark.parquet
+def test_make_parquet_fragment_from_buffer():
+import pyarrow.parquet as pq
+
+table = pa.table([['a', 'b', 'c'],
+  [12, 11, 10],
+  ['dog', 'cat', 'rabbit']],
+ names=['alpha', 'num', 'animal'])
+
+out = pa.BufferOutputStream()
+pq.write_table(table, out)
+
+buffer = out.getvalue()
+
+formats = [
+ds.ParquetFileFormat(),
+ds.ParquetFileFormat(
+read_options=ds.ParquetReadOptions(
+use_buffered_stream=True,
+buffer_size=4096,

Review comment:
   @jorisvandenbossche updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7642: ARROW-9329: [C++][Gandiva] Implement castTimestampToDate function in gandiva

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7642:
URL: https://github.com/apache/arrow/pull/7642#issuecomment-654192802


   https://issues.apache.org/jira/browse/ARROW-9329



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-06 Thread GitBox


wesm commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r449802099



##
File path: cpp/src/arrow/compute/api_scalar.h
##
@@ -259,6 +259,27 @@ Result IsValid(const Datum& values, ExecContext* 
ctx = NULLPTR);
 ARROW_EXPORT
 Result IsNull(const Datum& values, ExecContext* ctx = NULLPTR);
 
+struct ARROW_EXPORT FillNullOptions : public FunctionOptions {
+  explicit FillNullOptions(Datum fill_value) : 
fill_value(std::move(fill_value)) {}
+
+  Datum fill_value;
+};

Review comment:
   See comments above. I think this should be implemented as a binary 
function since the "fill values" could be provided by an array. This also will 
allow the execution layer to (in the near future) insert implicit casts where 
needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7644: ARROW-9330: [C++] Fix crash and undefined behaviour on corrupt IPC input

2020-07-06 Thread GitBox


pitrou opened a new pull request #7644:
URL: https://github.com/apache/arrow/pull/7644


   Should fix the following issues:
   * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23910
   * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23916
   * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23921



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


jorisvandenbossche commented on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-654267493


   @github-actions crossbow submit test-conda-python-3.7-spark-master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-06 Thread GitBox


nealrichardson opened a new pull request #7648:
URL: https://github.com/apache/arrow/pull/7648


   In terms of number of lines of code, this wasn't bad, though I don't know 
how efficient these methods are. Maybe there's a better way
   
   The one thing that would be lost is any metadata attached to the Table 
schema because the Table is reconstructed from its ChunkedArrays without 
schema. I wonder if we could export the Schema on its own--the existing 
`_export_to_c`/`_import_from_c` methods take both an array pointer and a schema 
pointer.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7620: ARROW-9013: [C++] Validate CMake options

2020-07-06 Thread GitBox


kou commented on pull request #7620:
URL: https://github.com/apache/arrow/pull/7620#issuecomment-654445290


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7620: ARROW-9013: [C++] Validate CMake options

2020-07-06 Thread GitBox


kou closed pull request #7620:
URL: https://github.com/apache/arrow/pull/7620


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices

2020-07-06 Thread GitBox


rok commented on pull request #7044:
URL: https://github.com/apache/arrow/pull/7044#issuecomment-654457797


   Hey @mrkn - sorry I didn't have capacity to reply for a while.
   
   If I remember correctly only SciPy has this architecture of COO index having 
two vectors instead of one 2D matrix.
   
   Question: could we handle this as a special case of COO tensor rather than a 
new type? Could we serialize `(row, col)` data as a single row major tensor of 
COO type and only deserialize it into the SciPy layout if desired? (I'm asking 
because I'm not sure if such approach is feasible)
   
   If we go for a new type - could I propose a name SparseCOOMatrix (as opposed 
to n-dimensional SparseCOOTensor). It could perhaps be shortened to `COOM`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-06 Thread GitBox


BryanCutler commented on pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#issuecomment-654462400


   Yeah, it seems possible to fix on the Spark side, but I haven't been able to 
take a close look at this yet. I'll try to do that as soon as possible and 
report back.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7648:
URL: https://github.com/apache/arrow/pull/7648#issuecomment-654431173


   https://issues.apache.org/jira/browse/ARROW-8301



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7637: ARROW-9137: [GLib] Add gparquet_arrow_file_reader_read_row_group()

2020-07-06 Thread GitBox


kou commented on pull request #7637:
URL: https://github.com/apache/arrow/pull/7637#issuecomment-654434362


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7637: ARROW-9137: [GLib] Add gparquet_arrow_file_reader_read_row_group()

2020-07-06 Thread GitBox


kou closed pull request #7637:
URL: https://github.com/apache/arrow/pull/7637


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7639: ARROW-4600: [Ruby] Arrow::DictionaryArray#[] returns dictionary value

2020-07-06 Thread GitBox


kou commented on pull request #7639:
URL: https://github.com/apache/arrow/pull/7639#issuecomment-654434646


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7647: ARROW-9337: [R] On C++ library build failure, give an unambiguous message

2020-07-06 Thread GitBox


nealrichardson closed pull request #7647:
URL: https://github.com/apache/arrow/pull/7647


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7639: ARROW-4600: [Ruby] Arrow::DictionaryArray#[] returns dictionary value

2020-07-06 Thread GitBox


kou closed pull request #7639:
URL: https://github.com/apache/arrow/pull/7639


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7646: ARROW-7237: [C++] Use Result in arrow/json APIs

2020-07-06 Thread GitBox


kou closed pull request #7646:
URL: https://github.com/apache/arrow/pull/7646


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled

2020-07-06 Thread GitBox


wesm opened a new pull request #7651:
URL: https://github.com/apache/arrow/pull/7651


   I discovered a place where clang-format was left disabled by accident 
relatively early in a file and so doing the reformatting as a separate PR to 
not have the diff noise in another patch. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7596: ARROW-9163: [C++] Validate UTF8 contents of a StringArray

2020-07-06 Thread GitBox


wesm closed pull request #7596:
URL: https://github.com/apache/arrow/pull/7596


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7596: ARROW-9163: [C++] Validate UTF8 contents of a StringArray

2020-07-06 Thread GitBox


wesm commented on a change in pull request #7596:
URL: https://github.com/apache/arrow/pull/7596#discussion_r450548152



##
File path: cpp/src/arrow/array/array_binary.cc
##
@@ -24,11 +24,27 @@
 #include "arrow/type.h"
 #include "arrow/util/checked_cast.h"
 #include "arrow/util/logging.h"
+#include "arrow/util/utf8.h"
 
 namespace arrow {
 
 using internal::checked_cast;
 
+namespace {
+
+template 
+Status ValidateStringData(const StringArrayType& array) {
+  util::InitializeUTF8();
+  for (int64_t i = 0; i < array.length(); ++i) {
+if (!array.IsNull(i) && !util::ValidateUTF8(array.GetView(i))) {
+  return Status::Invalid("Invalid UTF8 sequence at string index ", i);
+}
+  }
+  return Status::OK();
+}

Review comment:
   If this became a perf bottleneck, this would be pretty easy to speed up





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7651:
URL: https://github.com/apache/arrow/pull/7651#issuecomment-654532558


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm merged pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled

2020-07-06 Thread GitBox


wesm merged pull request #7651:
URL: https://github.com/apache/arrow/pull/7651


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7652: ARROW-9341: [GLib] Use arrow::Datum version Take()

2020-07-06 Thread GitBox


kou opened a new pull request #7652:
URL: https://github.com/apache/arrow/pull/7652


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] shiro615 commented on pull request #7639: ARROW-4600: [Ruby] Arrow::DictionaryArray#[] returns dictionary value

2020-07-06 Thread GitBox


shiro615 commented on pull request #7639:
URL: https://github.com/apache/arrow/pull/7639#issuecomment-654531677


   Thank you for working on this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled

2020-07-06 Thread GitBox


wesm commented on pull request #7651:
URL: https://github.com/apache/arrow/pull/7651#issuecomment-654532749


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7652: ARROW-9341: [GLib] Use arrow::Datum version Take()

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7652:
URL: https://github.com/apache/arrow/pull/7652#issuecomment-654557935


   https://issues.apache.org/jira/browse/ARROW-9341



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7649: ARROW-9336: [Ruby] Add support for missing keys in StructArrayBuilder

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7649:
URL: https://github.com/apache/arrow/pull/7649#issuecomment-654476061


   https://issues.apache.org/jira/browse/ARROW-9336



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions

2020-07-06 Thread GitBox


wesm commented on pull request #7632:
URL: https://github.com/apache/arrow/pull/7632#issuecomment-654491519


   Any opinions about these new APIs? @xhochy or @pitrou could you take a quick 
look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7650: ARROW-9340: [R] Use CRAN version of decor package

2020-07-06 Thread GitBox


nealrichardson opened a new pull request #7650:
URL: https://github.com/apache/arrow/pull/7650


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7650: ARROW-9340: [R] Use CRAN version of decor package

2020-07-06 Thread GitBox


github-actions[bot] commented on pull request #7650:
URL: https://github.com/apache/arrow/pull/7650#issuecomment-654501588


   https://issues.apache.org/jira/browse/ARROW-9340



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7649: ARROW-9336: [Ruby] Add support for missing keys in StructArrayBuilder

2020-07-06 Thread GitBox


kou opened a new pull request #7649:
URL: https://github.com/apache/arrow/pull/7649


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


wesm commented on pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#issuecomment-654492131


   +1. Thanks @kszucs for this work!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-06 Thread GitBox


wesm closed pull request #7519:
URL: https://github.com/apache/arrow/pull/7519


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-06 Thread GitBox


nealrichardson commented on pull request #7623:
URL: https://github.com/apache/arrow/pull/7623#issuecomment-654497318


   @jorisvandenbossche #7519 has merged so I just rebased this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices

2020-07-06 Thread GitBox


mrkn commented on pull request #7044:
URL: https://github.com/apache/arrow/pull/7044#issuecomment-654582143


   >> If we go for a new type - could I propose a name SparseCOOMatrix (as 
opposed to n-dimensional SparseCOOTensor). It could perhaps be shortened to 
COOM?
   >
   > The implementation in this pull-request can handle more than 2-dimension.
   
   Oh, I forgot to change the title of this pull-request. Sorry.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices

2020-07-06 Thread GitBox


mrkn commented on pull request #7044:
URL: https://github.com/apache/arrow/pull/7044#issuecomment-654581941


   > Hey @mrkn - sorry I didn't have capacity to reply for a while.
   
   No problem. Thank you for your cooperation!
   
   > If I remember correctly only SciPy has this architecture of COO index 
having two vectors instead of one 2D matrix.
   
   Not only scipy, but also 
[SuiteSparse](https://github.com/DrTimothyAldenDavis/SuiteSparse) employs the 
split format.
   
   > Question: could we handle this as a special case of COO tensor rather than 
a new type? Could we serialize `(row, col)` data as a single row major tensor 
of COO type and only deserialize it into the SciPy layout if desired? (I'm 
asking because I'm not sure if such approach is feasible)
   > 
   
   Although we can handle the split-format as an internal variation of 
SparseCOOIndex, we still need to introduce the new flatbuffer type.
   
   > If we go for a new type - could I propose a name SparseCOOMatrix (as 
opposed to n-dimensional SparseCOOTensor). It could perhaps be shortened to 
`COOM`?
   
   The implementation in this pull-request can handle more than 2-dimension.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org