[GitHub] [arrow] westonpace commented on a diff in pull request #33676: GH-33673: [C++] Standardize as-of-join convention for past and future tolerance

2023-01-20 Thread GitBox


westonpace commented on code in PR #33676:
URL: https://github.com/apache/arrow/pull/33676#discussion_r1082894510


##
cpp/src/arrow/compute/exec/options.h:
##
@@ -523,7 +523,8 @@ class ARROW_EXPORT AsofJoinNodeOptions : public 
ExecNodeOptions {
   ///
   /// \see `Keys` for details.
   std::vector input_keys;
-  /// \brief Tolerance for inexact "on" key matching.  Must be non-negative.
+  /// \brief Tolerance for inexact "on" key matching. Positive (resp. 
negative) tolerance
+  /// is interpreted as a future (resp. past) as-of-join.

Review Comment:
   The `resp.` abbreviation is unknown to me.  Perhaps we can be more explicit 
here.
   
   ```suggestion
 /// \brief Tolerance for inexact "on" key matching.  Two rows are only 
considered a match
 ///   if `left.on - right.on < tolerance`.  `tolerance` may be 
negative and, if so, right side
 ///   rows may even have a greater `on` value than the left side.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] westonpace commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

2023-01-20 Thread GitBox


westonpace commented on PR #33770:
URL: https://github.com/apache/arrow/pull/33770#issuecomment-1398740288

   > I haven't run C++ unit tests in forever, so figured I'd get some feedback 
before diving in there.
   
   Sorry, I was thinking of R e2e tests.  I would hope the C++ change is 
covered by existing tests.  Although I think we've found in the past that it is 
easy to accidentally load too much from the disk and still pass the tests.
   
   > @jorisvandenbossche mentioned this in my previous PR, and that's why I 
wanted to send nested refs instead of top-level columns. So why aren't I 
hitting that code?
   
   I don't know sadly.  I will try and investigate later today.  I could tell 
you how it works in the new scan node :laughing: but I don't think that will be 
too useful to you yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] nealrichardson commented on issue #33702: [R] Package Arrow 11.0.0 for R/CRAN

2023-01-20 Thread GitBox


nealrichardson commented on issue #33702:
URL: https://github.com/apache/arrow/issues/33702#issuecomment-1398734889

   gcc13 no longer shows up on 
https://cran.r-project.org/web/checks/check_results_arrow.html so we're good 
there. Just have to deal with clang 16 now :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] nealrichardson commented on issue #33635: R package may be failing to compile on gcc13

2023-01-20 Thread GitBox


nealrichardson commented on issue #33635:
URL: https://github.com/apache/arrow/issues/33635#issuecomment-1398733956

   https://cran.r-project.org/web/checks/check_results_arrow.html no longer 
shows a gcc13 issue, so perhaps BDR rebuilt latest and the issue resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] ursabot commented on pull request #3554: Fix final page row count in parquet-index binary

2023-01-20 Thread GitBox


ursabot commented on PR #3554:
URL: https://github.com/apache/arrow-rs/pull/3554#issuecomment-1398731968

   Benchmark runs are scheduled for baseline = 
19e3e8c8314f87d8c2acf3a7b69538fdec6f793c and contender = 
0ec5f72e6d21556d5677b74dd5d45d93c5af0b38. 
0ec5f72e6d21556d5677b74dd5d45d93c5af0b38 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bc66302d805c45659a86655be45b50b8...d276d660444948cf85e9c43d707e5451/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/de80f85ba35f4de2a10eb3aaea32809e...119c985f270e4efeba561436da7b8972/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8fee90c4b17244ed99def45b5c3b11a9...017ee177ee0849369ca235525a0906bb/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/34e43387f85245669e655f898434425e...76aed7437bbe4f3ca6b634e044d90d92/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] charlesbluca opened a new issue, #5004: `LogicalPlan.schema()` returns incorrect schema for `CreateMemoryTable` and `CreateView`

2023-01-20 Thread GitBox


charlesbluca opened a new issue, #5004:
URL: https://github.com/apache/arrow-datafusion/issues/5004

   **Describe the bug**
   For `LogicalPlan::CreateMemoryTable` and `CreateView`, `schema()` returns 
the schema of the input plan, rather than the schema of the newly created 
table/view:
   
   
https://github.com/apache/arrow-datafusion/blob/92d0a054c23e5fba91718db32ccd933ce86dd2b6/datafusion/expr/src/logical_plan/plan.rs#L155-L156
   
   **Expected behavior**
   I would expect `LogicalPlan.schema()` to return the schema of the newly 
created table or view.
   
   **Additional context**
   This came up in discussion around fetching the schema of a created memory 
table or view in 
https://github.com/dask-contrib/dask-sql/pull/854#discussion_r1081723309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a diff in pull request #3572: Packing array into dictionary of generic byte array

2023-01-20 Thread GitBox


viirya commented on code in PR #3572:
URL: https://github.com/apache/arrow-rs/pull/3572#discussion_r1082878810


##
arrow-cast/src/cast.rs:
##
@@ -3344,42 +3350,23 @@ where
 Ok(Arc::new(b.finish()))
 }
 
-// Packs the data as a StringDictionaryArray, if possible, with the
-// key types of K
-fn pack_string_to_dictionary(
-array: ,
-cast_options: ,
-) -> Result
-where
-K: ArrowDictionaryKeyType,
-{
-let cast_values = cast_with_options(array, ::Utf8, cast_options)?;
-let values = cast_values.as_any().downcast_ref::().unwrap();
-let mut b = StringDictionaryBuilderwith_capacity(values.len(), 
1024, 1024);
-
-// copy each element one at a time
-for i in 0..values.len() {
-if values.is_null(i) {
-b.append_null();
-} else {
-b.append(values.value(i))?;
-}
-}
-Ok(Arc::new(b.finish()))
-}
-
-// Packs the data as a BinaryDictionaryArray, if possible, with the
+// Packs the data as a GenericByteDictionaryBuilder, if possible, with the
 // key types of K
-fn pack_binary_to_dictionary(
+fn pack_byte_to_dictionary(
 array: ,
 cast_options: ,
 ) -> Result
 where
 K: ArrowDictionaryKeyType,
+T: ByteArrayType,
 {
-let cast_values = cast_with_options(array, ::Binary, 
cast_options)?;
-let values = cast_values.as_any().downcast_ref::().unwrap();
-let mut b = BinaryDictionaryBuilderwith_capacity(values.len(), 
1024, 1024);
+let cast_values = cast_with_options(array, ::DATA_TYPE, cast_options)?;

Review Comment:
   It's necessary as this supports that the source type is different than the 
value type of dictionary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a diff in pull request #3572: Packing array into dictionary of generic byte array

2023-01-20 Thread GitBox


viirya commented on code in PR #3572:
URL: https://github.com/apache/arrow-rs/pull/3572#discussion_r1082878810


##
arrow-cast/src/cast.rs:
##
@@ -3344,42 +3350,23 @@ where
 Ok(Arc::new(b.finish()))
 }
 
-// Packs the data as a StringDictionaryArray, if possible, with the
-// key types of K
-fn pack_string_to_dictionary(
-array: ,
-cast_options: ,
-) -> Result
-where
-K: ArrowDictionaryKeyType,
-{
-let cast_values = cast_with_options(array, ::Utf8, cast_options)?;
-let values = cast_values.as_any().downcast_ref::().unwrap();
-let mut b = StringDictionaryBuilderwith_capacity(values.len(), 
1024, 1024);
-
-// copy each element one at a time
-for i in 0..values.len() {
-if values.is_null(i) {
-b.append_null();
-} else {
-b.append(values.value(i))?;
-}
-}
-Ok(Arc::new(b.finish()))
-}
-
-// Packs the data as a BinaryDictionaryArray, if possible, with the
+// Packs the data as a GenericByteDictionaryBuilder, if possible, with the
 // key types of K
-fn pack_binary_to_dictionary(
+fn pack_byte_to_dictionary(
 array: ,
 cast_options: ,
 ) -> Result
 where
 K: ArrowDictionaryKeyType,
+T: ByteArrayType,
 {
-let cast_values = cast_with_options(array, ::Binary, 
cast_options)?;
-let values = cast_values.as_any().downcast_ref::().unwrap();
-let mut b = BinaryDictionaryBuilderwith_capacity(values.len(), 
1024, 1024);
+let cast_values = cast_with_options(array, ::DATA_TYPE, cast_options)?;

Review Comment:
   It's necessary as this supports some casting like Binary -> Dictionary(Int8, 
Utf8), the source type is different than the value type of dictionary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] viirya merged pull request #3554: Fix final page row count in parquet-index binary

2023-01-20 Thread GitBox


viirya merged PR #3554:
URL: https://github.com/apache/arrow-rs/pull/3554


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33792: GH-33789: [Go] Add Err() to RecordReader

2023-01-20 Thread GitBox


ursabot commented on PR #33792:
URL: https://github.com/apache/arrow/pull/33792#issuecomment-1398720935

   Benchmark runs are scheduled for baseline = 
bf8780d0ff794c50312d799a9e877430e99dcf8b and contender = 
f744bab97fb6e10663b0b414855534e24383056b. 
f744bab97fb6e10663b0b414855534e24383056b is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/94c9b4a163f14b0491eb2840f4e77e0b...375b72996a844a5d9fbfd079f1a5fdd5/)
   [Failed] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/f01eb06ee1c847878cb2a0ea5203c9ff...372b6fe6b9c84cfba5330710212ce77f/)
   [Failed] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/7c6838d9b82846e1b17e51b91cff38d4...c9da915e711741ce87b477d995b54a84/)
   [Failed] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/0f44a1e877014d468db822b036b8a44c...217b34a316f04236b48445b5af7b242a/)
   Buildkite builds:
   [Finished] [`f744bab9` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2231)
   [Failed] [`f744bab9` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2256)
   [Failed] [`f744bab9` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2226)
   [Failed] [`f744bab9` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2249)
   [Finished] [`bf8780d0` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2230)
   [Failed] [`bf8780d0` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2255)
   [Failed] [`bf8780d0` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2225)
   [Failed] [`bf8780d0` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2248)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] comphead commented on a diff in pull request #3570: Remove unwrap on datetime cast for CSV writer

2023-01-20 Thread GitBox


comphead commented on code in PR #3570:
URL: https://github.com/apache/arrow-rs/pull/3570#discussion_r1082867652


##
arrow-csv/src/writer.rs:
##
@@ -672,4 +710,26 @@ sed do eiusmod 
tempor,-556132.25,1,,2019-04-18T02:45:55.55500,23:46:03,foo
 let expected = nanoseconds.into_iter().map(Some).collect::>();
 assert_eq!(actual, expected);
 }
+
+#[test]
+fn test_write_csv_invalid_cast() {
+let schema = Schema::new(vec![
+Field::new("c0", DataType::UInt32, false),
+Field::new("c1", DataType::Date64, false),
+]);
+
+let c0 = UInt32Array::from(vec![Some(123), Some(234)]);
+let c1 = Date64Array::from(vec![Some(1926632005177), 
Some(1926632005177685347)]);
+let batch =
+RecordBatch::try_new(Arc::new(schema), vec![Arc::new(c0), 
Arc::new(c1)])
+.unwrap();
+
+let mut file = tempfile::tempfile().unwrap();
+let mut writer = Writer::new( file);
+let batches = vec![, ];
+for batch in batches {
+writer.write(batch).map_err(|e| { dbg!(e.to_string()); 
assert!(e.to_string().ends_with(invalid_cast_error("arrow_array::array::primitive_array::PrimitiveArray".to_owned(),
 1, 1).to_string().as_str()))}).unwrap_err();

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3574: Add external variant to ParquetError (#3285)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3574:
URL: https://github.com/apache/arrow-rs/pull/3574#discussion_r1082867599


##
parquet/src/errors.rs:
##
@@ -17,12 +17,13 @@
 
 //! Common Parquet errors and macros.
 
+use std::error::Error;
 use std::{cell, io, result, str};
 
 #[cfg(feature = "arrow")]
 use arrow_schema::ArrowError;
 
-#[derive(Debug, PartialEq, Clone, Eq)]

Review Comment:
   Hence the scream test :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a diff in pull request #3574: Add external variant to ParquetError (#3285)

2023-01-20 Thread GitBox


viirya commented on code in PR #3574:
URL: https://github.com/apache/arrow-rs/pull/3574#discussion_r1082866719


##
parquet/src/errors.rs:
##
@@ -17,12 +17,13 @@
 
 //! Common Parquet errors and macros.
 
+use std::error::Error;
 use std::{cell, io, result, str};
 
 #[cfg(feature = "arrow")]
 use arrow_schema::ArrowError;
 
-#[derive(Debug, PartialEq, Clone, Eq)]

Review Comment:
   I guess @alamb means a downstream project may use `ParquetError` embedded in 
another error type? For that we cannot detect in our test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] westonpace commented on a diff in pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread GitBox


westonpace commented on code in PR #14596:
URL: https://github.com/apache/arrow/pull/14596#discussion_r1082859333


##
ci/scripts/integration_substrait.sh:
##
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -e
+
+# check that optional pyarrow modules are available
+# because pytest would just skip the substrait tests
+echo "Substrait Integration Tests";
+python -c "from substrait_consumer.consumers import AceroConsumer"
+python -c "import pyarrow.orc"

Review Comment:
   I don't think we're enabling `orc` and I'm unclear how it is related.



##
ci/scripts/integration_substrait.sh:
##
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -e
+
+# check that optional pyarrow modules are available
+# because pytest would just skip the substrait tests
+echo "Substrait Integration Tests";
+python -c "from substrait_consumer.consumers import AceroConsumer"
+python -c "import pyarrow.orc"
+python -c "import pyarrow.substrait"

Review Comment:
   I see the comment but I'm still confused what is happening here.  Is the 
problem that the consumer testing suite would silently skip all the tests and 
look like a pass if these commands fail?  I'm pretty sure `pytest` fails if no 
tests were found.



##
ci/scripts/integration_substrait.sh:
##
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -e
+
+# check that optional pyarrow modules are available
+# because pytest would just skip the substrait tests
+echo "Substrait Integration Tests";
+python -c "from substrait_consumer.consumers import AceroConsumer"
+python -c "import pyarrow.orc"
+python -c "import pyarrow.substrait"
+
+pytest 
substrait_consumer/tests/functional/extension_functions/test_boolean_functions.py
 --producer IsthmusProducer --consumer AceroConsumer

Review Comment:
   It's a start :)



##
ci/scripts/install_substrait_consumer.sh:
##
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either 

[GitHub] [arrow-rs] ursabot commented on pull request #3563: Implement Extend for ArrayBuilder (#1841)

2023-01-20 Thread GitBox


ursabot commented on PR #3563:
URL: https://github.com/apache/arrow-rs/pull/3563#issuecomment-1398686577

   Benchmark runs are scheduled for baseline = 
a1cedb4fdfb561eda4e836a6c8fcb898d7a37029 and contender = 
19e3e8c8314f87d8c2acf3a7b69538fdec6f793c. 
19e3e8c8314f87d8c2acf3a7b69538fdec6f793c is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/39c6792cb9314558aeca361712dcca65...bc66302d805c45659a86655be45b50b8/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/05af447fe65b4023835bf20e9b67708e...de80f85ba35f4de2a10eb3aaea32809e/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/51dd5d80b4fe48159493ad8f8fad0a0e...8fee90c4b17244ed99def45b5c3b11a9/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/00472df3e5d04a58b1819d0ff4b00bca...34e43387f85245669e655f898434425e/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold closed issue #3562: Panic on Key Overflow in Dictionary Builders

2023-01-20 Thread GitBox


tustvold closed issue #3562: Panic on Key Overflow in Dictionary Builders
URL: https://github.com/apache/arrow-rs/issues/3562


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold merged pull request #3563: Implement Extend for ArrayBuilder (#1841)

2023-01-20 Thread GitBox


tustvold merged PR #3563:
URL: https://github.com/apache/arrow-rs/pull/3563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold closed issue #1841: Implement Extend for Builder

2023-01-20 Thread GitBox


tustvold closed issue #1841: Implement Extend for Builder
URL: https://github.com/apache/arrow-rs/issues/1841


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33768: GH-33767: [Go] Clear out parameter in ArrowArrayStream.get_next

2023-01-20 Thread GitBox


ursabot commented on PR #33768:
URL: https://github.com/apache/arrow/pull/33768#issuecomment-1398677269

   Benchmark runs are scheduled for baseline = 
a4236abd3b88fb1d4db55ec82afcaf7f50183639 and contender = 
bf8780d0ff794c50312d799a9e877430e99dcf8b. 
bf8780d0ff794c50312d799a9e877430e99dcf8b is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7591adc65d434307a8abc257aa6627fb...94c9b4a163f14b0491eb2840f4e77e0b/)
   [Failed] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/401cb18adbd5447281de8b14c6c232c5...f01eb06ee1c847878cb2a0ea5203c9ff/)
   [Failed] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6043273b90b74674a5a490fff2acb8ab...7c6838d9b82846e1b17e51b91cff38d4/)
   [Failed] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/65d5a0a4e4534397b64e37ebf67470b6...0f44a1e877014d468db822b036b8a44c/)
   Buildkite builds:
   [Finished] [`bf8780d0` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2230)
   [Failed] [`bf8780d0` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2255)
   [Failed] [`bf8780d0` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2225)
   [Failed] [`bf8780d0` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2248)
   [Finished] [`a4236abd` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2229)
   [Finished] [`a4236abd` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2254)
   [Failed] [`a4236abd` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2224)
   [Failed] [`a4236abd` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2247)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3570: Remove unwrap on datetime cast for CSV writer

2023-01-20 Thread GitBox


tustvold commented on code in PR #3570:
URL: https://github.com/apache/arrow-rs/pull/3570#discussion_r1082820993


##
arrow-csv/src/writer.rs:
##
@@ -88,6 +88,35 @@ where
 lexical_to_string(c.value(i))
 }
 
+fn invalid_cast_error(dt: String, col_index: usize, row_index: usize) -> 
ArrowError {
+let mut s = String::new();
+s.push_str("Cannot cast to ");
+s.push_str();
+s.push_str(" at col index: ");
+s.push_str(col_index.to_string().as_str());
+s.push_str(" row index: ");
+s.push_str(row_index.to_string().as_str());
+ArrowError::CastError(s)
+}
+
+macro_rules! write_temporal_value {

Review Comment:
   Aah I missed the $f



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread GitBox


github-actions[bot] commented on PR #14596:
URL: https://github.com/apache/arrow/pull/14596#issuecomment-1398662412

   Revision: 8dafc82cd91e88070a5f55a7f0e3966f5a845682
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-d5e698c020](https://github.com/ursacomputing/crossbow/branches/all?query=actions-d5e698c020)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.9-substrait|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d5e698c020-github-test-conda-python-3.9-substrait)](https://github.com/ursacomputing/crossbow/actions/runs/3969623890/jobs/6804363220)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread GitBox


github-actions[bot] commented on PR #14596:
URL: https://github.com/apache/arrow/pull/14596#issuecomment-1398660480

   Revision: 85554d34a7553d21cd91eb19b5ac797293752336
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-ad25329806](https://github.com/ursacomputing/crossbow/branches/all?query=actions-ad25329806)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.9-substrait|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-ad25329806-github-test-conda-python-3.9-substrait)](https://github.com/ursacomputing/crossbow/actions/runs/3969610939/jobs/6804332898)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] comphead commented on a diff in pull request #3570: Remove unwrap on datetime cast for CSV writer

2023-01-20 Thread GitBox


comphead commented on code in PR #3570:
URL: https://github.com/apache/arrow-rs/pull/3570#discussion_r1082816995


##
arrow-csv/src/writer.rs:
##
@@ -88,6 +88,35 @@ where
 lexical_to_string(c.value(i))
 }
 
+fn invalid_cast_error(dt: String, col_index: usize, row_index: usize) -> 
ArrowError {
+let mut s = String::new();
+s.push_str("Cannot cast to ");
+s.push_str();
+s.push_str(" at col index: ");
+s.push_str(col_index.to_string().as_str());
+s.push_str(" row index: ");
+s.push_str(row_index.to_string().as_str());
+ArrowError::CastError(s)
+}
+
+macro_rules! write_temporal_value {

Review Comment:
   Generic was too complex, as we need to use different conversion functions 
depending on the type, I have tried generics first and then falled back to 
macros



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] vibhatha commented on pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread GitBox


vibhatha commented on PR #14596:
URL: https://github.com/apache/arrow/pull/14596#issuecomment-1398659669

   @github-actions crossbow submit test-conda-python-3.9-substrait


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] comphead commented on a diff in pull request #3570: Remove unwrap on datetime cast for CSV writer

2023-01-20 Thread GitBox


comphead commented on code in PR #3570:
URL: https://github.com/apache/arrow-rs/pull/3570#discussion_r1082815909


##
arrow-csv/src/writer.rs:
##
@@ -88,6 +88,35 @@ where
 lexical_to_string(c.value(i))
 }
 
+fn invalid_cast_error(dt: String, col_index: usize, row_index: usize) -> 
ArrowError {
+let mut s = String::new();

Review Comment:
   Thanks @tustvold 
   Reg to https://github.com/hoodie/concatenation_benchmarks-rs the format 
macro is 5x slower than string push. But you right, the exception is evauluated 
in `ok_or_else` whic is lazy and won't affect performance on millions of rows, 
Will fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] vibhatha commented on pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread GitBox


vibhatha commented on PR #14596:
URL: https://github.com/apache/arrow/pull/14596#issuecomment-1398656902

   @github-actions crossbow submit test-conda-python-3.9-substrait


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ozankabak commented on pull request #5003: Support for bounded execution when window frame involves UNBOUNDED PRECEDING

2023-01-20 Thread GitBox


ozankabak commented on PR #5003:
URL: 
https://github.com/apache/arrow-datafusion/pull/5003#issuecomment-1398656384

   A quick summary to help reviews: If all you are doing is something like a 
running sum, you can get the job done with bounded memory even if your frame is 
ever-growing. This PR paves the way for Datafusion to support these kinds of 
use cases with low memory usage and without breaking the pipeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-ballista] thinkharderdev opened a new issue, #619: Prune unneccessary data from task definition

2023-01-20 Thread GitBox


thinkharderdev opened a new issue, #619:
URL: https://github.com/apache/arrow-ballista/issues/619

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   
   When the scheduler sends a task to the executor it has to send the 
serialized `ExecutionPlan`. For very large plans (for isntance, scanning 10s of 
thousands of files) the plan can be very large and the cost to 
serialize/deserialize to protobuf is significant. 
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   Since each task is only executing a single partition, we can prune all the 
`FileScanConfig` `file_groups` for other partitions. This can eliminate most of 
the bulk of the serialized plan. 
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   1. When preparing a task definition, prune the `ExecutionPlan` prior to 
serialization. This can be done pretty straightforwardly as a 
`PhysicalOptimizerRule` to handle the standard cases (`ParquetExec`, `CsvExec`, 
etc).
   2. For custom cases such as user-defined `ExecutionPlan` impls, add an 
argument to `PhysicalExtensionCodec::try_encode`:
   
   ```
   pub trait PhysicalExtensionCodec: Debug + Send + Sync {
   fn try_encode(
   ,
   node: Arc,
   partitions: &[usize],
   buf:  Vec,
   ) -> Result<(), BallistaError>;
   
   ...
   
   }
   ```
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


ozankabak commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082791126


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(
+item_columns: &[ArrayRef],
+target: &[ScalarValue],
+sort_options: &[SortOptions],
+) -> Result {
+let low: usize = 0;
+let high: usize = item_columns
+.get(0)
+.ok_or_else(|| {
+DataFusionError::Internal("Column array shouldn't be 
empty".to_string())
+})?
+.len();
+let compare_fn = |current: &[ScalarValue], target: &[ScalarValue]| {
+let cmp = compare_rows(current, target, sort_options)?;
+Ok(if SIDE { cmp.is_lt() } else { cmp.is_le() })
+};
+search_in_slice(item_columns, target, compare_fn, low, high)
+}
+
+/// This function searches for a tuple of given values (`target`) among a 
slice of
+/// the given rows (`item_columns`) via a linear scan. The slice starts at the 
index
+/// `low` and ends at the index `high`. The boolean-valued function 
`compare_fn`
+/// specifies the stopping criterion.
+pub fn search_in_slice(
+item_columns: &[ArrayRef],
+target: &[ScalarValue],
+compare_fn: F,
+mut low: usize,
+high: usize,
+) -> Result
+where
+F: Fn(&[ScalarValue], &[ScalarValue]) -> Result,
+{
+while low < high {

Review Comment:
   I think you mean something like this:
   ```rust
   Ok((low..high).find(|| {
   let val = get_row_at_idx(item_columns, idx)?;
   !compare_fn(, target)?
   }).unwrap_or(high))
   ```
   
   The problem is with the `?` operators, we would need to change them to 
`unwrap` calls for this to work. The code would look nicer, but we would be 
incurring the downside of panicking in case something goes wrong. In general, I 
prefer to err on the side of being a little more verbose than necessary but 
retain control over errors, but I don't have a strong opinion on this specific 
case. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33772: GH-15137: [C++][CI] Fix ASAN error in streaming JSON reader tests

2023-01-20 Thread GitBox


ursabot commented on PR #33772:
URL: https://github.com/apache/arrow/pull/33772#issuecomment-1398646800

   ['Python', 'R'] benchmarks have high level of regressions.
   
[test-mac-arm](https://conbench.ursa.dev/compare/runs/83ac5871fe62452a9c95ecf98c4fa293...401cb18adbd5447281de8b14c6c232c5/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33772: GH-15137: [C++][CI] Fix ASAN error in streaming JSON reader tests

2023-01-20 Thread GitBox


ursabot commented on PR #33772:
URL: https://github.com/apache/arrow/pull/33772#issuecomment-1398646540

   Benchmark runs are scheduled for baseline = 
a1a587b1d1415a96edbb358cdf363241064a6d64 and contender = 
a4236abd3b88fb1d4db55ec82afcaf7f50183639. 
a4236abd3b88fb1d4db55ec82afcaf7f50183639 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1de7097bb03645b69e60d0281ded9a5d...7591adc65d434307a8abc257aa6627fb/)
   [Finished :arrow_down:1.17% :arrow_up:0.03%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/83ac5871fe62452a9c95ecf98c4fa293...401cb18adbd5447281de8b14c6c232c5/)
   [Failed] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1d9273efc6a643549b46a4a7bd51c474...6043273b90b74674a5a490fff2acb8ab/)
   [Failed] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2b3b47b8313a4139848884db6ed8b809...65d5a0a4e4534397b64e37ebf67470b6/)
   Buildkite builds:
   [Finished] [`a4236abd` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2229)
   [Finished] [`a4236abd` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2254)
   [Failed] [`a4236abd` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2224)
   [Failed] [`a4236abd` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2247)
   [Finished] [`a1a587b1` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2228)
   [Finished] [`a1a587b1` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2253)
   [Failed] [`a1a587b1` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2223)
   [Failed] [`a1a587b1` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2246)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] wjones127 commented on a diff in pull request #33694: MINOR: [C++][Parquet] Rephrase decimal annotation

2023-01-20 Thread GitBox


wjones127 commented on code in PR #33694:
URL: https://github.com/apache/arrow/pull/33694#discussion_r1082720103


##
cpp/src/parquet/properties.h:
##
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
   return this->disable_statistics(path->ToDotString());
 }
 
-/// Enable integer type to annotate decimal type as below:
-///   int32: 1 <= precision <= 9
-///   int64: 10 <= precision <= 18
-/// Default disabled.
-Builder* enable_integer_annotate_decimal() {
-  integer_annotate_decimal_ = true;
+/// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+/// integer physical type.
+///
+/// According to the specs, DECIMAL can be used to annotate the following 
types:
+/// - int32: for 1 <= precision <= 9.
+/// - int64: for 1 <= precision <= 18; precision < 10 will produce a 
warning.
+/// - fixed_len_byte_array: precision is limited by the array size.
+///   Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits.
+/// - binary: precision is not limited, but is required. precision is not 
limited,
+///   but is required. The minimum number of bytes to store the unscaled 
value
+///   should be used.

Review Comment:
   I'd rather write about what our function does than what is in the spec. The 
spec is documentation for us (the developers), but this documentation is for 
our users. That's why, for example, I suggest changing "should be used" to "is 
used".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ozankabak commented on pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


ozankabak commented on PR #4989:
URL: 
https://github.com/apache/arrow-datafusion/pull/4989#issuecomment-1398641302

   Thank you for carefully reviewing @alamb. We will consider further 
optimizing by leveraging `RowFormat` in a follow-on PR. As @mustafasrepo 
mentions, it is not obvious to use how we can utilize directly 
`LexicographicalComparator`, but if we figure out a way, we will make another 
follow-on PR for that too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


ozankabak commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082791126


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(
+item_columns: &[ArrayRef],
+target: &[ScalarValue],
+sort_options: &[SortOptions],
+) -> Result {
+let low: usize = 0;
+let high: usize = item_columns
+.get(0)
+.ok_or_else(|| {
+DataFusionError::Internal("Column array shouldn't be 
empty".to_string())
+})?
+.len();
+let compare_fn = |current: &[ScalarValue], target: &[ScalarValue]| {
+let cmp = compare_rows(current, target, sort_options)?;
+Ok(if SIDE { cmp.is_lt() } else { cmp.is_le() })
+};
+search_in_slice(item_columns, target, compare_fn, low, high)
+}
+
+/// This function searches for a tuple of given values (`target`) among a 
slice of
+/// the given rows (`item_columns`) via a linear scan. The slice starts at the 
index
+/// `low` and ends at the index `high`. The boolean-valued function 
`compare_fn`
+/// specifies the stopping criterion.
+pub fn search_in_slice(
+item_columns: &[ArrayRef],
+target: &[ScalarValue],
+compare_fn: F,
+mut low: usize,
+high: usize,
+) -> Result
+where
+F: Fn(&[ScalarValue], &[ScalarValue]) -> Result,
+{
+while low < high {

Review Comment:
   I think you mean something like this:
   ```rust
   Ok((low..high).find(|| {
   let val = get_row_at_idx(item_columns, idx)?;
   !compare_fn(, target)?
   }).unwrap_or(high))
   ```
   
   The problem is with the `?` operators, we would need to change them to 
`unwrap` calls for this to work. The code would look nicer, but we would 
incurring the downside of panicking in case something goes wrong. In general, I 
prefer to err on the side of being a little more verbose than necessary but 
retain control over errors, but I don't have a strong opinion on this specific 
case. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on pull request #356: feat(go/adbc/driver/pkg/cmake): cmake build for Go shared library drivers

2023-01-20 Thread GitBox


lidavidm commented on PR #356:
URL: https://github.com/apache/arrow-adbc/pull/356#issuecomment-1398624356

   Weird - it actually fails on SQlite here now.
   
   ```
   === Building driver/sqlite ===
   + mkdir -p /adbc/build/x64/driver/sqlite
   + pushd /adbc/build/x64/driver/sqlite
   /adbc/build/x64/driver/sqlite /
   + cmake -G Ninja -DADBC_BUILD_SHARED=ON -DADBC_BUILD_STATIC=OFF 
-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON -DCMAKE_INSTALL_LIBDIR=lib 
-DCMAKE_INSTALL_PREFIX=/adbc/build/x64 
-DCMAKE_TOOLCHAIN_FILE=/opt/vcpkg/scripts/buildsystems/vcpkg.cmake 
-DCMAKE_UNITY_BUILD=ON -DVCPKG_OVERLAY_TRIPLETS=/adbc/ci/vcpkg/triplets/ 
-DVCPKG_TARGET_TRIPLET=x64-linux-static-release /adbc/c/driver/sqlite
   -- The C compiler identification is GNU 10.2.1
   -- The CXX compiler identification is GNU 10.2.1
   -- Detecting C compiler ABI info
   -- Detecting C compiler ABI info - failed
   -- Check for working C compiler: /opt/rh/devtoolset-10/root/usr/bin/cc
   CMake Error: The source directory 
"/adbc/build/x64/driver/sqlite/CMakeFiles/CMakeTmp" does not appear to contain 
CMakeLists.txt.
   Specify --help for usage, or press the help button on the CMake GUI.
   CMake Error at 
/usr/local/share/cmake-3.21/Modules/CMakeTestCCompiler.cmake:56 (try_compile):
 Failed to configure test project build system.
   Call Stack (most recent call first):
 /adbc/c/cmake_modules/AdbcDefines.cmake:21 (enable_language)
 CMakeLists.txt:21 (include)
   
   
   -- Configuring incomplete, errors occurred!
   See also "/adbc/build/x64/driver/sqlite/CMakeFiles/CMakeOutput.log".
   See also "/adbc/build/x64/driver/sqlite/CMakeFiles/CMakeError.log".
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] zeroshade commented on a diff in pull request #33795: GH-33794: [Go] Add SetRecordReader to PreparedStatement

2023-01-20 Thread GitBox


zeroshade commented on code in PR #33795:
URL: https://github.com/apache/arrow/pull/33795#discussion_r1082761204


##
go/arrow/flight/flightsql/client.go:
##
@@ -518,22 +544,44 @@ func (p *PreparedStatement) GetSchema(ctx 
context.Context) (*flight.SchemaResult
return p.client.getSchema(ctx, desc, p.opts...)
 }
 
-// SetParameters takes a record batch to send as the parameter bindings when
-// executing. It should match the schema from ParameterSchema.
-//
-// This will call Retain on the record to ensure it doesn't get released
-// out from under the statement. Release will be called on a previous
-// binding record if it existed, and will be called upon calling Close
-// on the PreparedStatement.
-func (p *PreparedStatement) SetParameters(binding arrow.Record) {
+func (p *PreparedStatement) clearParameters() {
if p.paramBinding != nil {
p.paramBinding.Release()
p.paramBinding = nil
}
+   if p.streamBinding != nil {
+   p.streamBinding.Release()
+   p.streamBinding = nil
+   }
+}
+
+// SetParameters takes a record batch to send as the parameter bindings when
+// executing. It should match the schema from ParameterSchema.
+//
+// This will call Retain on the record to ensure it doesn't get released out
+// from under the statement. Release will be called on a previous binding
+// record or reader if it existed, and will be called upon calling Close on the
+// PreparedStatement.
+func (p *PreparedStatement) SetParameters(binding arrow.Record) {
+   p.clearParameters()
p.paramBinding = binding
p.paramBinding.Retain()
 }

Review Comment:
   nvm just saw you have `Release will be called on a previous binding record 
or reader`, ignore this :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] zeroshade commented on a diff in pull request #33795: GH-33794: [Go] Add SetRecordReader to PreparedStatement

2023-01-20 Thread GitBox


zeroshade commented on code in PR #33795:
URL: https://github.com/apache/arrow/pull/33795#discussion_r1082759927


##
go/arrow/flight/flightsql/client.go:
##
@@ -518,22 +544,44 @@ func (p *PreparedStatement) GetSchema(ctx 
context.Context) (*flight.SchemaResult
return p.client.getSchema(ctx, desc, p.opts...)
 }
 
-// SetParameters takes a record batch to send as the parameter bindings when
-// executing. It should match the schema from ParameterSchema.
-//
-// This will call Retain on the record to ensure it doesn't get released
-// out from under the statement. Release will be called on a previous
-// binding record if it existed, and will be called upon calling Close
-// on the PreparedStatement.
-func (p *PreparedStatement) SetParameters(binding arrow.Record) {
+func (p *PreparedStatement) clearParameters() {
if p.paramBinding != nil {
p.paramBinding.Release()
p.paramBinding = nil
}
+   if p.streamBinding != nil {
+   p.streamBinding.Release()
+   p.streamBinding = nil
+   }
+}
+
+// SetParameters takes a record batch to send as the parameter bindings when
+// executing. It should match the schema from ParameterSchema.
+//
+// This will call Retain on the record to ensure it doesn't get released out
+// from under the statement. Release will be called on a previous binding
+// record or reader if it existed, and will be called upon calling Close on the
+// PreparedStatement.
+func (p *PreparedStatement) SetParameters(binding arrow.Record) {
+   p.clearParameters()
p.paramBinding = binding
p.paramBinding.Retain()
 }

Review Comment:
   should probably mention in the docstring that it will clear any existing 
bound params



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] zeroshade commented on a diff in pull request #33795: GH-33794: [Go] Add SetRecordReader to PreparedStatement

2023-01-20 Thread GitBox


zeroshade commented on code in PR #33795:
URL: https://github.com/apache/arrow/pull/33795#discussion_r1082757863


##
go/arrow/flight/flightsql/client.go:
##
@@ -491,6 +490,33 @@ func (p *PreparedStatement) ExecuteUpdate(ctx 
context.Context) (nrecords int64,
return updateResult.GetRecordCount(), nil
 }
 
+func (p *PreparedStatement) hasBindParameters() bool {
+   return (p.paramBinding != nil && p.paramBinding.NumRows() > 0) || 
(p.streamBinding != nil)
+}
+
+func (p *PreparedStatement) writeBindParameters(pstream 
pb.FlightService_DoPutClient, desc *pb.FlightDescriptor) (*flight.Writer, 
error) {
+   if p.paramBinding != nil {
+   wr := flight.NewRecordWriter(pstream, 
ipc.WithSchema(p.paramBinding.Schema()))
+   wr.SetFlightDescriptor(desc)
+   if err := wr.Write(p.paramBinding); err != nil {
+   return nil, err
+   }
+   return wr, nil
+   } else {
+   wr := flight.NewRecordWriter(pstream, 
ipc.WithSchema(p.streamBinding.Schema()))
+   wr.SetFlightDescriptor(desc)
+   for p.streamBinding.Next() {
+   if err := wr.Write(p.streamBinding.Record()); err != 
nil {
+   return nil, err
+   }
+   }

Review Comment:
   `arrio.Copy` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33778: GH-33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module

2023-01-20 Thread GitBox


ursabot commented on PR #33778:
URL: https://github.com/apache/arrow/pull/33778#issuecomment-1398613044

   ['Python', 'R'] benchmarks have high level of regressions.
   
[test-mac-arm](https://conbench.ursa.dev/compare/runs/2eb76bfb924947cb97a14cbb8822eecf...83ac5871fe62452a9c95ecf98c4fa293/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33778: GH-33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module

2023-01-20 Thread GitBox


ursabot commented on PR #33778:
URL: https://github.com/apache/arrow/pull/33778#issuecomment-1398612792

   Benchmark runs are scheduled for baseline = 
fc1f9ebbc4c3ae77d5cfc2f9322f4373d3d19b8a and contender = 
a1a587b1d1415a96edbb358cdf363241064a6d64. 
a1a587b1d1415a96edbb358cdf363241064a6d64 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/378e78bf15b54d219d75e56e70f77f03...1de7097bb03645b69e60d0281ded9a5d/)
   [Finished :arrow_down:0.87% :arrow_up:0.0%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/2eb76bfb924947cb97a14cbb8822eecf...83ac5871fe62452a9c95ecf98c4fa293/)
   [Failed] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8241a532f9e44142b1ca6ae7dd27d316...1d9273efc6a643549b46a4a7bd51c474/)
   [Failed] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/653a4a727c1c4535bf0e7c3e7ee4693c...2b3b47b8313a4139848884db6ed8b809/)
   Buildkite builds:
   [Finished] [`a1a587b1` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2228)
   [Finished] [`a1a587b1` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2253)
   [Failed] [`a1a587b1` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2223)
   [Failed] [`a1a587b1` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2246)
   [Finished] [`fc1f9ebb` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2227)
   [Finished] [`fc1f9ebb` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2252)
   [Failed] [`fc1f9ebb` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/)
   [Failed] [`fc1f9ebb` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2245)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

2023-01-20 Thread GitBox


nealrichardson commented on PR #33770:
URL: https://github.com/apache/arrow/pull/33770#issuecomment-1398606967

   > Do you want some unit tests?
   
   Of course, this needs some. The tests that were added for this function when 
it was introduced 
(https://github.com/apache/arrow/commit/8972ebd81202cdfb2e59c77eb0475120525e6230#diff-273815dd1d6770a7ef790980d9039adddf0ef8efa88c0745234906ab16ac09dfR2784-R2838)
 are more e2e than unit tests of the function, but I could try to tack on 
something there with a struct column and a nested ref. Happy to take a 
different approach if you have suggestions though. I haven't run C++ unit tests 
in forever, so figured I'd get some feedback before diving in there. 
   
   > 
   > This seems like it will load the entire column into memory (since you're 
using the top-level name). 
   
   True, and this is how it currently works from R anyway. The effect of this 
change is to not fail when other languages/libraries try to scan with nested 
refs.
   
   > For formats that support nested load (e.g. parquet) I thought we had a 
better implementation that already worked. For example:
   > 
   > 
https://github.com/apache/arrow/blob/9a1373452ff5b4cf41cc371e0585d8dda91ffd36/cpp/src/arrow/dataset/file_parquet.cc#L252
   > 
   > seems to be expecting nested refs in the projection. But I could be 
misunderstanding. The expression-based projection of the old scanner always 
confused me a little.
   
   @jorisvandenbossche mentioned this in my previous PR, and that's why I 
wanted to send nested refs instead of top-level columns. So why aren't I 
hitting that code? I'm creating a ScanNode for an ExecPlan. Or am I only 
hitting this code because I'm testing with an InMemoryDataset? Clearly I'm not 
hitting file_parquet.cc with that, but does a parquet FileSystemDataset avoid 
this scanner code entirely? I'm skeptical about that (though it should be 
easily verified) since this function was added to fix an issue with unbound 
schemas that IIRC people were experiencing with actual parquet datasets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #4834: (#4462) Postgres compatibility tests using sqllogictest

2023-01-20 Thread GitBox


avantgardnerio commented on PR #4834:
URL: 
https://github.com/apache/arrow-datafusion/pull/4834#issuecomment-1398604791

   > 1. Don't orchestrate the postgres containers with rust test code
   
   Good catch... I :100: % agree with this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo opened a new pull request, #5003: Support for bounded execution when window frame involves UNBOUNDED PRECEDING

2023-01-20 Thread GitBox


mustafasrepo opened a new pull request, #5003:
URL: https://github.com/apache/arrow-datafusion/pull/5003

   # Which issue does this PR close?
   
   
   
   Closes [#4978](https://github.com/apache/arrow-datafusion/issues/4978)
   
   # Rationale for this change
   
   
   Currently, queries that contain `UNBOUNDED PRECEDING` as in their window 
frame bounds, like the one below
   ```sql
   SELECT
   SUM(c1) OVER (ORDER BY c3 RANGE BETWEEN UNBOUNDED PRECEDING AND 11 
FOLLOWING) as a1
   FROM aggregate_test_100
   ```
   run with `WindowAggExec`. However, many aggregators do not require the whole 
range in memory to calculate their results -- the above query can actually run 
with `BoundedWindowAggExec`.
   
   # What changes are included in this PR?
   
   
   This PR adds support for bounded-memory execution of suitable window 
functions even when the start bound is `UNBOUNDED PRECEDING`.
   
   # Are these changes tested?
   
   
   We added new tests that verify the updated (i.e. optimized) physical plan. 
We also added fuzzy window tests to generate window frame bounds with 
`UNBOUNDED PRECEDING`. Fuzzy tests can now generate window frame bounds in the 
form `RANGE BETWEEN N PRECEDING AND M PRECEDING` or `RANGE BETWEEN M FOLLOWING 
AND N FOLLOWING`, which increases effective coverage.
   
   # Are there any user-facing changes?
   
   
   
   No.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] LucyMcGowan commented on issue #14826: write_dataset is crashing on my machine

2023-01-20 Thread GitBox


LucyMcGowan commented on issue #14826:
URL: https://github.com/apache/arrow/issues/14826#issuecomment-1398595442

   Installing the most recent version of R fixed this! Thank you @assignUser -- 
do you want me to close the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #33808: GH-20272: [C++] Bump version of bundled AWS SDK

2023-01-20 Thread GitBox


github-actions[bot] commented on PR #33808:
URL: https://github.com/apache/arrow/pull/33808#issuecomment-1398593959

   * Closes: #20272


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] js8544 opened a new pull request, #33808: GH-20272: [C++] Bump version of bundled AWS SDK

2023-01-20 Thread GitBox


js8544 opened a new pull request, #33808:
URL: https://github.com/apache/arrow/pull/33808

   
   
   ### Rationale for this change
   
   
   
   Bump AWS SDK version to 1.10.55.
   
   ### What changes are included in this PR?
   
   
   Bump AWS SDK version to 1.10.55.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082729694


##
datafusion/common/src/utils.rs:
##
@@ -22,8 +22,16 @@ use arrow::array::ArrayRef;
 use arrow::compute::SortOptions;
 use std::cmp::Ordering;
 
+/// Given column vectors, returns row at `idx`.
+pub fn get_row_at_idx(columns: &[ArrayRef], idx: usize) -> 
Result> {
+columns
+.iter()
+.map(|arr| ScalarValue::try_from_array(arr, idx))
+.collect()
+}
+
 /// This function compares two tuples depending on the given sort options.
-fn compare(
+pub fn compare_rows(

Review Comment:
   This function is used both in `datafusion/common` and 
`datafusion/physical_expr.` Hence unfortunately `pub(crate)` wouldn't work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ablack3 commented on issue #33807: Using dplyr::tally with an Arrow FileSystemDataset crashes R

2023-01-20 Thread GitBox


ablack3 commented on issue #33807:
URL: https://github.com/apache/arrow/issues/33807#issuecomment-1398589729

   This might be a clue
   ```
*** caught illegal operation ***
  address 0x13d7349a8, cause 'illegal opcode'
  
  Traceback:
   1: Array__GetScalar(Array$create(x, type = type), 0)
   2: Scalar$create(x)
   3: compute___expr__scalar(Scalar$create(x))
   4: Expression$scalar(1L)
   5: n()
   6: eval_tidy(expr, mask)
   7: doTryCatch(return(expr), name, parentenv, handler)
   8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
   9: tryCatchList(expr, classes, parentenv, handlers)
  10: tryCatch(eval_tidy(expr, mask), error = function(e) {msg <- 
conditionMessage(e)if (getOption("arrow.debug", FALSE)) print(msg)  
  patterns <- .cache$i18ized_error_patternif (is.null(patterns)) {
patterns <- i18ize_error_messages().cache$i18ized_error_pattern <- 
patterns}if (grepl(patterns, msg)) {stop(e)}out <- 
structure(msg, class = "try-error", condition = e)if (grepl("not 
supported.*Arrow", msg) || getOption("arrow.debug", FALSE)) {
class(out) <- c("arrow-try-error", class(out))}invisible(out)})
  11: arrow_eval(expr, mask)
  12: arrow_eval_or_stop(as_quosure(expr, ctx$quo_env), ctx$mask)
  13: summarize_eval(names(exprs)[i], exprs[[i]], ctx, 
length(.data$group_by_vars) > 0)
  14: do_arrow_summarize(.data, !!!exprs, .groups = .groups)
  15: doTryCatch(return(expr), name, parentenv, handler)
  16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  17: tryCatchList(expr, classes, parentenv, handlers)
  18: tryCatch(expr, error = function(e) {call <- conditionCall(e)
if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch)))  
   call <- sys.call(-4L)dcall <- deparse(call, nlines = 1L)
prefix <- paste("Error in", dcall, ": ")LONG <- 75Lsm <- 
strsplit(conditionMessage(e), "\n")[[1L]]w <- 14L + nchar(dcall, type = 
"w") + nchar(sm[1L], type = "w")if (is.na(w)) w <- 14L + 
nchar(dcall, type = "b") + nchar(sm[1L], type = "b")if 
(w > LONG) prefix <- paste0(prefix, "\n  ")}else prefix <- 
"Error : "msg <- paste0(prefix, conditionMessage(e), "\n")
.Internal(seterrmessage(msg[1L]))if (!silent && 
isTRUE(getOption("show.error.messages"))) {cat(msg, file = outFile) 
   .Internal(printDeferredWarnings())}invisible(structure(msg, class = 
"try-error", condition = e))})
  19: try(do_arrow_summarize(.data, !!!exprs, .groups = .groups), silent = 
TRUE)
  20: summarise.ArrowTabular(x, `:=`(!!name, n()))
  21: dplyr::summarize(x, `:=`(!!name, n()))
  22: tally.ArrowTabular(.)
  23: tally(.)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] wjones127 commented on a diff in pull request #33694: MINOR: [C++][Parquet] Rephrase decimal annotation

2023-01-20 Thread GitBox


wjones127 commented on code in PR #33694:
URL: https://github.com/apache/arrow/pull/33694#discussion_r1082720103


##
cpp/src/parquet/properties.h:
##
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
   return this->disable_statistics(path->ToDotString());
 }
 
-/// Enable integer type to annotate decimal type as below:
-///   int32: 1 <= precision <= 9
-///   int64: 10 <= precision <= 18
-/// Default disabled.
-Builder* enable_integer_annotate_decimal() {
-  integer_annotate_decimal_ = true;
+/// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+/// integer physical type.
+///
+/// According to the specs, DECIMAL can be used to annotate the following 
types:
+/// - int32: for 1 <= precision <= 9.
+/// - int64: for 1 <= precision <= 18; precision < 10 will produce a 
warning.
+/// - fixed_len_byte_array: precision is limited by the array size.
+///   Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits.
+/// - binary: precision is not limited, but is required. precision is not 
limited,
+///   but is required. The minimum number of bytes to store the unscaled 
value
+///   should be used.

Review Comment:
   I'd rather write about what our function does than what is in the spec. The 
spec is documentation for us (the developers), but this documentation is for 
our users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33764: GH-15109: [Python] Allow creation of non empty struct array with zero field

2023-01-20 Thread GitBox


ursabot commented on PR #33764:
URL: https://github.com/apache/arrow/pull/33764#issuecomment-1398577405

   ['Python', 'R'] benchmarks have high level of regressions.
   
[test-mac-arm](https://conbench.ursa.dev/compare/runs/2e36b6e440484302ad20c5b43dc9a58c...2eb76bfb924947cb97a14cbb8822eecf/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ursabot commented on pull request #5002: Bump sqllogictest to v0.11.1

2023-01-20 Thread GitBox


ursabot commented on PR #5002:
URL: 
https://github.com/apache/arrow-datafusion/pull/5002#issuecomment-1398576985

   Benchmark runs are scheduled for baseline = 
03601bee545599a8be3ef982bc98f7b3a71fb3df and contender = 
92d0a054c23e5fba91718db32ccd933ce86dd2b6. 
92d0a054c23e5fba91718db32ccd933ce86dd2b6 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f32b6073aae943efbbe53ac612c647aa...b7602ae58f5d4b7ba3324d1ffe741657/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/54268e5d17024e729965b15290cf8ff6...a4af3422df424d45b8c593bf88fcdb2e/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/0c9df41bc0ef4700a2ea5c0de9ddc081...62bf27b1bb66413091df940f3d3c4b2c/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ffa25a55a4db4b2591cc4a4a764e6386...dcbd2be5fd934bd7957e5699f4165edd/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33764: GH-15109: [Python] Allow creation of non empty struct array with zero field

2023-01-20 Thread GitBox


ursabot commented on PR #33764:
URL: https://github.com/apache/arrow/pull/33764#issuecomment-1398576941

   Benchmark runs are scheduled for baseline = 
e920474d7f1dbc7702c08117481db0cd4297b581 and contender = 
fc1f9ebbc4c3ae77d5cfc2f9322f4373d3d19b8a. 
fc1f9ebbc4c3ae77d5cfc2f9322f4373d3d19b8a is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/dc563f53283540688a486ea172c910c9...378e78bf15b54d219d75e56e70f77f03/)
   [Finished :arrow_down:1.11% :arrow_up:0.0%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/2e36b6e440484302ad20c5b43dc9a58c...2eb76bfb924947cb97a14cbb8822eecf/)
   [Failed] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a94f84bc58924aeabf85c79bed0011ae...8241a532f9e44142b1ca6ae7dd27d316/)
   [Failed] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a0af231eecf746c8b60891054b1cf84c...653a4a727c1c4535bf0e7c3e7ee4693c/)
   Buildkite builds:
   [Finished] [`fc1f9ebb` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2227)
   [Finished] [`fc1f9ebb` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2252)
   [Failed] [`fc1f9ebb` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/)
   [Failed] [`fc1f9ebb` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2245)
   [Finished] [`e920474d` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2226)
   [Finished] [`e920474d` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2251)
   [Failed] [`e920474d` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2221)
   [Finished] [`e920474d` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2244)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 merged pull request #5002: Bump sqllogictest to v0.11.1

2023-01-20 Thread GitBox


xudong963 merged PR #5002:
URL: https://github.com/apache/arrow-datafusion/pull/5002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold closed issue #3159: Support Nested Types in Row Format

2023-01-20 Thread GitBox


tustvold closed issue #3159: Support Nested Types in Row Format
URL: https://github.com/apache/arrow-rs/issues/3159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] wjones127 commented on a diff in pull request #33694: MINOR: [C++][Parquet] Rephrase decimal annotation

2023-01-20 Thread GitBox


wjones127 commented on code in PR #33694:
URL: https://github.com/apache/arrow/pull/33694#discussion_r1082703943


##
cpp/src/parquet/properties.h:
##
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
   return this->disable_statistics(path->ToDotString());
 }
 
-/// Enable integer type to annotate decimal type as below:
-///   int32: 1 <= precision <= 9
-///   int64: 10 <= precision <= 18
-/// Default disabled.
-Builder* enable_integer_annotate_decimal() {
-  integer_annotate_decimal_ = true;
+/// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+/// integer physical type.

Review Comment:
   Do negative precision decimals exist?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] paleolimbot commented on issue #366: [Discuss] Is the conventional commit format working?

2023-01-20 Thread GitBox


paleolimbot commented on issue #366:
URL: https://github.com/apache/arrow-adbc/issues/366#issuecomment-1398565056

   I rather like reading the conventional commit PR notifications...it's not 
perfectly consistent but it's *more* consistent than Arrow's "language-only" 
component. I'd like to adopt whatever you do here in nanoarrow, too, and I 
found it rather awkward when developing the IPC extension to figure out what 
the component was (Extension-IPC? IPC? C/IPC C-IPC?). The subdirectory makes a 
lot of sense (`feat(extension/nanoarrow_ipc): foo`). For R stuff, 
`feat(r/adbcdrivermanager)` I think is kind of nice (again, not perfect).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-20 Thread GitBox


DDtKey commented on code in PR #3365:
URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081030906


##
arrow-csv/src/reader/records.rs:
##
@@ -0,0 +1,266 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::ArrowError;
+use csv_core::{ReadRecordResult, Reader};
+use std::io::BufRead;
+
+/// The estimated length of a field in bytes
+const AVERAGE_FIELD_SIZE: usize = 8;
+
+/// The minimum amount of data in a single read
+const MIN_CAPACITY: usize = 1024;
+
+pub struct RecordReader {
+reader: R,
+delimiter: Reader,
+
+num_columns: usize,
+
+num_rows: usize,
+offsets: Vec,
+data: Vec,
+}
+
+impl RecordReader {
+pub fn new(reader: R, delimiter: Reader, num_columns: usize) -> Self {
+Self {
+reader,
+delimiter,
+num_columns,
+num_rows: 0,
+offsets: vec![],
+data: vec![],
+}
+}
+
+fn fill_buf( self, to_read: usize) -> Result<(), ArrowError> {
+// Reserve sufficient capacity in offsets
+self.offsets.resize(to_read * self.num_columns + 1, 0);
+self.num_rows = 0;
+
+if to_read == 0 {
+return Ok(());
+}
+
+// The current offset into `self.data`
+let mut output_offset = 0;
+// The current offset into `input`
+let mut input_offset = 0;
+// The current offset into `self.offsets`
+let mut field_offset = 1;
+// The number of fields read for the current row
+let mut field_count = 0;
+
+'outer: loop {
+let input = self.reader.fill_buf()?;
+
+'input: loop {
+// Reserve necessary space in output data based on best 
estimate
+let remaining_rows = to_read - self.num_rows;
+let capacity = remaining_rows * self.num_columns * 
AVERAGE_FIELD_SIZE;
+let estimated_data = capacity.max(MIN_CAPACITY);
+self.data.resize(output_offset + estimated_data, 0);
+
+loop {
+let (result, bytes_read, bytes_written, end_positions) =
+self.delimiter.read_record(
+[input_offset..],
+ self.data[output_offset..],
+ self.offsets[field_offset..],
+);
+
+field_count += end_positions;
+field_offset += end_positions;
+input_offset += bytes_read;
+output_offset += bytes_written;
+
+match result {
+ReadRecordResult::End => break 'outer, // Reached end 
of file
+ReadRecordResult::InputEmpty => break 'input, // Input 
exhausted, need to read more
+ReadRecordResult::OutputFull => break, // Need to 
allocate more capacity
+ReadRecordResult::OutputEndsFull => {
+return Err(ArrowError::CsvError(format!("incorrect 
number of fields, expected {} got more than {}", self.num_columns, 
field_count)))

Review Comment:
   @tustvold yes, I did & it prints. But the line in files are correct, it 
always refer to first one in my case. 樂  As I said, I'll try to create a MRE, 
otherwise it's hard to explain



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm merged pull request #364: ci: download arch-specific golang

2023-01-20 Thread GitBox


lidavidm merged PR #364:
URL: https://github.com/apache/arrow-adbc/pull/364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33780: GH-33779: [R] Nightly builds (R 3.5 and 3.6) failing due to field refs test

2023-01-20 Thread GitBox


ursabot commented on PR #33780:
URL: https://github.com/apache/arrow/pull/33780#issuecomment-1398538971

   ['Python', 'R'] benchmarks have high level of regressions.
   
[test-mac-arm](https://conbench.ursa.dev/compare/runs/ad7d2fade4df48c7b3718a3d97031fd1...2e36b6e440484302ad20c5b43dc9a58c/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #33780: GH-33779: [R] Nightly builds (R 3.5 and 3.6) failing due to field refs test

2023-01-20 Thread GitBox


ursabot commented on PR #33780:
URL: https://github.com/apache/arrow/pull/33780#issuecomment-1398538353

   Benchmark runs are scheduled for baseline = 
4c698fb3c2a2b4ee046c6ad6e992e81ed90c7b0e and contender = 
e920474d7f1dbc7702c08117481db0cd4297b581. 
e920474d7f1dbc7702c08117481db0cd4297b581 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/79a041199540455e8843a59ad958c9d2...dc563f53283540688a486ea172c910c9/)
   [Failed :arrow_down:1.41% :arrow_up:0.0%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/ad7d2fade4df48c7b3718a3d97031fd1...2e36b6e440484302ad20c5b43dc9a58c/)
   [Failed] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c1895394d29540dcb97b7a69120716f5...a94f84bc58924aeabf85c79bed0011ae/)
   [Finished :arrow_down:0.28% :arrow_up:0.0%] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/8f36663cfd3f44e49405b9fe83811b7c...a0af231eecf746c8b60891054b1cf84c/)
   Buildkite builds:
   [Finished] [`e920474d` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2226)
   [Finished] [`e920474d` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2251)
   [Failed] [`e920474d` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2221)
   [Finished] [`e920474d` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2244)
   [Finished] [`4c698fb3` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2225)
   [Failed] [`4c698fb3` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2250)
   [Finished] [`4c698fb3` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2220)
   [Finished] [`4c698fb3` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2243)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


alamb commented on PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#issuecomment-1398536300

   cc @Ted-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on pull request #365: feat(r): Add R Driver Manager

2023-01-20 Thread GitBox


lidavidm commented on PR #365:
URL: https://github.com/apache/arrow-adbc/pull/365#issuecomment-1398533476

   For those cpplint failures, you might need something like this:
   
   
https://github.com/apache/arrow-adbc/blob/1568815791594d6cd2e4cf1299d4d33e6aded78b/c/driver/sqlite/statement_reader.c#L83
   
   cpplint apparently gets confused when it sees `struct Foo` in a parameter 
list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082678855


##
parquet/src/util/bit_util.rs:
##
@@ -17,76 +17,104 @@
 
 use std::{cmp, mem::size_of};
 
-use crate::data_type::AsBytes;
+use crate::data_type::{AsBytes, ByteArray, FixedLenByteArray, Int96};
+use crate::errors::{ParquetError, Result};
 use crate::util::bit_pack::{unpack16, unpack32, unpack64, unpack8};
 use crate::util::memory::ByteBufferPtr;
 
 #[inline]
-pub fn from_ne_slice(bs: &[u8]) -> T {
-let mut b = T::Buffer::default();
-{
-let b = b.as_mut();
-let bs = [..b.len()];
-b.copy_from_slice(bs);
-}
-T::from_ne_bytes(b)
+pub fn from_le_slice(bs: &[u8]) -> T {
+// TODO: propagate the error (#3577)
+T::try_from_le_slice(bs).unwrap()
 }
 
 #[inline]
-pub fn from_le_slice(bs: &[u8]) -> T {
-let mut b = T::Buffer::default();
-{
-let b = b.as_mut();
-let bs = [..b.len()];
-b.copy_from_slice(bs);
+fn array_from_slice(bs: &[u8]) -> Result<[u8; N]> {
+// Need to slice as may be called with zero-padded values

Review Comment:
   I've confirmed this optimised as you would hope
   
   https://rust.godbolt.org/z/4WPdn4j8j



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ursabot commented on pull request #4984: minor: Update data type support documentation

2023-01-20 Thread GitBox


ursabot commented on PR #4984:
URL: 
https://github.com/apache/arrow-datafusion/pull/4984#issuecomment-1398518494

   Benchmark runs are scheduled for baseline = 
5dd5ffd5ea84d843b9ef34d0eaa9ac992618f6e2 and contender = 
03601bee545599a8be3ef982bc98f7b3a71fb3df. 
03601bee545599a8be3ef982bc98f7b3a71fb3df is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/56a8bcee1cd643ce80865ba3a710a119...f32b6073aae943efbbe53ac612c647aa/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/5e2ba9d6ebf94d83bcc8944cc96a2cbf...54268e5d17024e729965b15290cf8ff6/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/622391ae626e47fb978bce2975b24d3f...0c9df41bc0ef4700a2ea5c0de9ddc081/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/906ec70e752e49e0bc4258521395f2ec...ffa25a55a4db4b2591cc4a4a764e6386/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on pull request #364: ci: download arch-specific golang

2023-01-20 Thread GitBox


lidavidm commented on PR #364:
URL: https://github.com/apache/arrow-adbc/pull/364#issuecomment-1398515942

   Ok, it works now. One of the Go builds is a little flaky. 
https://github.com/lidavidm/arrow-adbc/actions/runs/3968565812/jobs/6801899981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082672997


##
parquet/src/bin/parquet-index.rs:
##
@@ -132,7 +132,7 @@ fn compute_row_counts(offset_index: &[PageLocation], rows: 
i64) -> Vec {
 }
 
 /// Prints index information for a single column chunk
-fn print_index(
+fn print_index(

Review Comment:
   I would be willing to review a PR that altered the display implementation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082672397


##
parquet/src/file/page_index/index.rs:
##
@@ -53,14 +53,14 @@ pub enum Index {
 /// will only return pageLocations without min_max index,
 /// `NONE` represents this lack of index information
 NONE,
-BOOLEAN(BooleanIndex),
+BOOLEAN(NativeIndex),
 INT32(NativeIndex),
 INT64(NativeIndex),
 INT96(NativeIndex),
 FLOAT(NativeIndex),
 DOUBLE(NativeIndex),
-BYTE_ARRAY(ByteArrayIndex),
-FIXED_LEN_BYTE_ARRAY(ByteArrayIndex),
+BYTE_ARRAY(NativeIndex),
+FIXED_LEN_BYTE_ARRAY(NativeIndex),

Review Comment:
   This is the breaking change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082671978


##
parquet/src/util/bit_util.rs:
##
@@ -17,76 +17,104 @@
 
 use std::{cmp, mem::size_of};
 
-use crate::data_type::AsBytes;
+use crate::data_type::{AsBytes, ByteArray, FixedLenByteArray, Int96};
+use crate::errors::{ParquetError, Result};
 use crate::util::bit_pack::{unpack16, unpack32, unpack64, unpack8};
 use crate::util::memory::ByteBufferPtr;
 
 #[inline]
-pub fn from_ne_slice(bs: &[u8]) -> T {
-let mut b = T::Buffer::default();
-{
-let b = b.as_mut();
-let bs = [..b.len()];
-b.copy_from_slice(bs);
-}
-T::from_ne_bytes(b)
+pub fn from_le_slice(bs: &[u8]) -> T {
+// TODO: propagate the error (#3577)
+T::try_from_le_slice(bs).unwrap()
 }
 
 #[inline]
-pub fn from_le_slice(bs: &[u8]) -> T {
-let mut b = T::Buffer::default();
-{
-let b = b.as_mut();
-let bs = [..b.len()];
-b.copy_from_slice(bs);
+fn array_from_slice(bs: &[u8]) -> Result<[u8; N]> {
+// Need to slice as may be called with zero-padded values
+match bs.get(..N) {
+Some(b) => Ok(b.try_into().unwrap()),
+None => Err(general_err!(
+"error converting value, expected {} bytes got {}",
+N,
+bs.len()
+)),
 }
-T::from_le_bytes(b)
 }
 
 pub trait FromBytes: Sized {

Review Comment:
   Note: this trait is not part of the public API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082671374


##
parquet/src/file/statistics.rs:
##
@@ -181,11 +181,11 @@ pub fn from_thrift(
 // min/max statistics for INT96 columns.
 let min = min.map(|data| {
 assert_eq!(data.len(), 12);
-from_ne_slice::()
+from_le_slice::()

Review Comment:
   The data is little endian, not native endian - 
https://github.com/apache/parquet-format/blob/master/Encodings.md#plain-plain--0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion-python] andygrove opened a new pull request, #147: Rename default branch from master to main

2023-01-20 Thread GitBox


andygrove opened a new pull request, #147:
URL: https://github.com/apache/arrow-datafusion-python/pull/147

   # Which issue does this PR close?
   
   
   
   Part of https://github.com/apache/arrow-datafusion-python/issues/144
   
# Rationale for this change
   
   
   See issue
   
   # What changes are included in this PR?
   
   
   Replace `master` with `main` where needed
   
   # Are there any user-facing changes?
   
   
   No
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082669525


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(
+item_columns: &[ArrayRef],
+target: &[ScalarValue],
+sort_options: &[SortOptions],
+) -> Result {
+let low: usize = 0;
+let high: usize = item_columns
+.get(0)
+.ok_or_else(|| {
+DataFusionError::Internal("Column array shouldn't be 
empty".to_string())
+})?
+.len();
+let compare_fn = |current: &[ScalarValue], target: &[ScalarValue]| {

Review Comment:
   `LexicographicalComparator` API, compares values at the two indices and 
returns their ordering. This useful to find change detection, or partition 
boundaries. However, in our case we need to search for specific value inside 
Array (possibly not existing in the array.). However, maybe with some kind of 
tweak, we maybe able to use `LexicographicalComparator` for our use case. I 
will think about it in detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


tustvold commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082668780


##
parquet/src/util/bit_util.rs:
##
@@ -17,76 +17,104 @@
 
 use std::{cmp, mem::size_of};
 
-use crate::data_type::AsBytes;
+use crate::data_type::{AsBytes, ByteArray, FixedLenByteArray, Int96};
+use crate::errors::{ParquetError, Result};
 use crate::util::bit_pack::{unpack16, unpack32, unpack64, unpack8};
 use crate::util::memory::ByteBufferPtr;
 
 #[inline]
-pub fn from_ne_slice(bs: &[u8]) -> T {
-let mut b = T::Buffer::default();
-{
-let b = b.as_mut();
-let bs = [..b.len()];
-b.copy_from_slice(bs);
-}
-T::from_ne_bytes(b)
+pub fn from_le_slice(bs: &[u8]) -> T {
+// TODO: propagate the error (#3577)
+T::try_from_le_slice(bs).unwrap()
 }
 
 #[inline]
-pub fn from_le_slice(bs: &[u8]) -> T {
-let mut b = T::Buffer::default();
-{
-let b = b.as_mut();
-let bs = [..b.len()];
-b.copy_from_slice(bs);
+fn array_from_slice(bs: &[u8]) -> Result<[u8; N]> {
+// Need to slice as may be called with zero-padded values
+match bs.get(..N) {
+Some(b) => Ok(b.try_into().unwrap()),
+None => Err(general_err!(
+"error converting value, expected {} bytes got {}",
+N,
+bs.len()
+)),
 }
-T::from_le_bytes(b)
 }
 
 pub trait FromBytes: Sized {
 type Buffer: AsMut<[u8]> + Default;
+fn try_from_le_slice(b: &[u8]) -> Result;
 fn from_le_bytes(bs: Self::Buffer) -> Self;
-fn from_be_bytes(bs: Self::Buffer) -> Self;
-fn from_ne_bytes(bs: Self::Buffer) -> Self;
 }
 
 macro_rules! from_le_bytes {
 ($($ty: ty),*) => {
 $(
 impl FromBytes for $ty {
 type Buffer = [u8; size_of::()];
+fn try_from_le_slice(b: &[u8]) -> Result {
+Ok(Self::from_le_bytes(array_from_slice(b)?))
+}
 fn from_le_bytes(bs: Self::Buffer) -> Self {
 <$ty>::from_le_bytes(bs)
 }
-fn from_be_bytes(bs: Self::Buffer) -> Self {
-<$ty>::from_be_bytes(bs)
-}
-fn from_ne_bytes(bs: Self::Buffer) -> Self {
-<$ty>::from_ne_bytes(bs)
-}

Review Comment:
   Data in parquet is always stored as little endian - 
https://github.com/apache/parquet-format/blob/master/Encodings.md#plain-plain--0
   
   The use of ne_bytes was actually incorrect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] thisisnic merged pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-20 Thread GitBox


thisisnic merged PR #33748:
URL: https://github.com/apache/arrow/pull/33748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] westonpace commented on issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

2023-01-20 Thread GitBox


westonpace commented on issue #33699:
URL: https://github.com/apache/arrow/issues/33699#issuecomment-1398508810

   Alternatively, we could try reducing the runtime of these tests when 
valgrind is enabled.  `parquet-arrow-test` for example tries many different 
type variations (8 different combinations of decimal) and we could probably 
trim that down when valgrind is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a diff in pull request #5002: Bump sqllogictest to v0.11.1

2023-01-20 Thread GitBox


xudong963 commented on code in PR #5002:
URL: https://github.com/apache/arrow-datafusion/pull/5002#discussion_r1082666973


##
datafusion/core/tests/sqllogictests/src/main.rs:
##
@@ -109,7 +109,7 @@ pub async fn main() -> Result<()> {
 info!("Using complete mode to complete {}", path.display());
 let col_separator = " ";
 let validator = default_validator;
-update_test_file(path, runner, col_separator, validator)
+update_test_file(path,  runner, col_separator, validator)

Review Comment:
   Yes, I changed it, lol



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5002: Bump sqllogictest to v0.11.1

2023-01-20 Thread GitBox


alamb commented on code in PR #5002:
URL: https://github.com/apache/arrow-datafusion/pull/5002#discussion_r1082665326


##
datafusion/core/tests/sqllogictests/src/main.rs:
##
@@ -109,7 +109,7 @@ pub async fn main() -> Result<()> {
 info!("Using complete mode to complete {}", path.display());
 let col_separator = " ";
 let validator = default_validator;
-update_test_file(path, runner, col_separator, validator)
+update_test_file(path,  runner, col_separator, validator)

Review Comment:
   needed due to 
https://github.com/risinglightdb/sqllogictest-rs/commit/879204b9b2dd5a81db4a9781c2e6c68d0713f5d3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed pull request #4922: Update sqllogictest requirement from 0.10.0 to 0.11.1

2023-01-20 Thread GitBox


alamb closed pull request #4922: Update sqllogictest requirement from 0.10.0 to 
0.11.1
URL: https://github.com/apache/arrow-datafusion/pull/4922


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] dependabot[bot] commented on pull request #4922: Update sqllogictest requirement from 0.10.0 to 0.11.1

2023-01-20 Thread GitBox


dependabot[bot] commented on PR #4922:
URL: 
https://github.com/apache/arrow-datafusion/pull/4922#issuecomment-1398506919

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4922: Update sqllogictest requirement from 0.10.0 to 0.11.1

2023-01-20 Thread GitBox


alamb commented on PR #4922:
URL: 
https://github.com/apache/arrow-datafusion/pull/4922#issuecomment-1398506861

   Dupe of https://github.com/apache/arrow-datafusion/pull/4922


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4960: Update pyo3 requirement from 0.17.1 to 0.18.0

2023-01-20 Thread GitBox


alamb commented on PR #4960:
URL: 
https://github.com/apache/arrow-datafusion/pull/4960#issuecomment-1398505810

   This needs to wait for arrow to update pyo3, which conveniently @viirya did 
in https://github.com/apache/arrow-rs/pull/3557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 merged pull request #4984: minor: Update data type support documentation

2023-01-20 Thread GitBox


xudong963 merged PR #4984:
URL: https://github.com/apache/arrow-datafusion/pull/4984


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] westonpace commented on issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

2023-01-20 Thread GitBox


westonpace commented on issue #33699:
URL: https://github.com/apache/arrow/issues/33699#issuecomment-1398505553

   I tried looking into this a bit more today.  I ran the `parquet-reader-test` 
on master, on the same commit that last passed (df4cb9588) and on a really old 
commit (54ff2d817ea5bb811f3653deeb12fc93452e) and finally on the 8.0.0 
release build (from May).  All  runs came very close to the 5 minute mark 
(actually the third run went over).  However, I didn't notice any significant 
differences. The most recent build performed best.
   
   If anything, the oddity is that the valgrind job passed.  The 
`test-conda-cpp-valgrind` job has been failing regularly all the way back to 
September and only passed for a few days in December.
   
   I recommend increasing the timeout similar to what we did with the TSAN 
build and then revisit in a few weeks.  @pitrou any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4922: Update sqllogictest requirement from 0.10.0 to 0.11.1

2023-01-20 Thread GitBox


alamb commented on PR #4922:
URL: 
https://github.com/apache/arrow-datafusion/pull/4922#issuecomment-1398501978

   Pushed 48e3681 for updated API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion-python] andygrove commented on issue #144: Change default branch name from master to main

2023-01-20 Thread GitBox


andygrove commented on issue #144:
URL: 
https://github.com/apache/arrow-datafusion-python/issues/144#issuecomment-1398500876

   INFRA issue: https://issues.apache.org/jira/browse/INFRA-24106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] bmmeijers commented on a diff in pull request #3578: Use native types in PageIndex (#3575)

2023-01-20 Thread GitBox


bmmeijers commented on code in PR #3578:
URL: https://github.com/apache/arrow-rs/pull/3578#discussion_r1082651910


##
parquet/src/bin/parquet-index.rs:
##
@@ -132,7 +132,7 @@ fn compute_row_counts(offset_index: &[PageLocation], rows: 
i64) -> Vec {
 }
 
 /// Prints index information for a single column chunk
-fn print_index(
+fn print_index(

Review Comment:
   Would it make sense to format the types for bytearray/fixedlenbytearray with 
hex layout (using `:x?`) by default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 opened a new pull request, #5002: Bump sqllogictest to v0.11.1

2023-01-20 Thread GitBox


xudong963 opened a new pull request, #5002:
URL: https://github.com/apache/arrow-datafusion/pull/5002

   # Which issue does this PR close?
   
   
   
   Closes #.
   
   # Rationale for this change
   
   
   
   # What changes are included in this PR?
   
   
   
   # Are these changes tested?
   
   
   
   # Are there any user-facing changes?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4995: [Feature] support describe file

2023-01-20 Thread GitBox


alamb commented on code in PR #4995:
URL: https://github.com/apache/arrow-datafusion/pull/4995#discussion_r1082633781


##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -67,6 +67,10 @@ pub struct ListingTableConfig {
 pub file_schema: Option,
 /// Optional `ListingOptions` for the to be created `ListingTable`.
 pub options: Option,
+/// Optional default is false, if temporary is true, then it means that
+/// we will create a temporary table for select * from `xxx.file`
+/// or describe 'xxx.file', the temporary table will be registered to the 
schema
+pub temporary_file: bool,

Review Comment:
     for the comments. Thank you @xiaoyong-z 
   
   I think the term `temporary_files` often means something different -- 
specifically files in `/tmp` or similar -- so it may be confusing to use the 
same term. 
   
   What would you think about using a different term than `temporary`? Perhaps 
`ephemeral` would be appropriate. 



##
datafusion/core/tests/sqllogictests/test_files/describe.slt:
##
@@ -0,0 +1,43 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+
+#   http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+##
+# Describe internal tables
+##
+
+statement ok
+set datafusion.catalog.information_schema = true
+
+statement ok
+CREATE external table aggregate_simple(c1 real, c2 double, c3 boolean) STORED 
as CSV WITH HEADER ROW LOCATION 'tests/data/aggregate_simple.csv';
+
+query C1
+DESCRIBE aggregate_simple;

Review Comment:
   This test works on master -- I don't think it needs the changes in this PR
   
   ```
   (arrow_dev) alamb@MacBook-Pro-8:~/Software/arrow-datafusion/datafusion/core$ 
datafusion-cli
   DataFusion CLI v16.0.0
   ❯ CREATE external table aggregate_simple(c1 real, c2 double, c3 boolean) 
STORED as CSV WITH HEADER ROW LOCATION 'tests/data/aggregate_simple.csv';
   0 rows in set. Query took 0.005 seconds.
   ❯ 
   DESCRIBE aggregate_simple;
   +-+---+-+
   | column_name | data_type | is_nullable |
   +-+---+-+
   | c1  | Float32   | NO  |
   | c2  | Float64   | NO  |
   | c3  | Boolean   | NO  |
   +-+---+-+
   3 rows in set. Query took 0.002 seconds.
   ❯ 
   ```



##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1625,6 +1637,15 @@ pub struct Prepare {
 pub input: Arc,
 }
 
+/// Describe a file

Review Comment:
   What would you think about making this more general `DescribeTable` and 
rather than taking a file_path it would have a `table_provider`? I think then 
we wouldn't have to special case describe for file / table -- instead 
DataFusion could always create `DescribeTable` 
   
   The query in terms of information_schema is sort of a hack in the first 
place. What you have started here with an actual LogicalPlan variant I think is 
much better  



##
datafusion/sql/src/statement.rs:
##
@@ -385,25 +385,32 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
 let DescribeTable { table_name } = statement;
 
 let where_clause = object_name_to_qualifier(_name);
+let table_str_name = object_name_to_table_str_name(_name);
 let table_ref = object_name_to_table_reference(table_name)?;
 
-// check if table_name exists
-let _ = self
+let table_source = self
 .schema_provider
 .get_table_provider((_ref).into())?;
 
-if self.has_table("information_schema", "tables") {
-let sql = format!(
-"SELECT column_name, data_type, is_nullable \
-FROM information_schema.columns WHERE 
{where_clause};"
-);
-let mut rewrite = DFParser::parse_sql()?;
-self.statement_to_plan(rewrite.pop_front().unwrap())
+if !table_source.is_temporary_file() {

Review Comment:
   I wonder if we can figure out a way to avoid special casing here. Perhaps we 
can always create a DescribeTable plan?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 

[GitHub] [arrow-datafusion-python] jdye64 commented on pull request #145: Substrait bindings

2023-01-20 Thread GitBox


jdye64 commented on PR #145:
URL: 
https://github.com/apache/arrow-datafusion-python/pull/145#issuecomment-1398497972

   Hey @andygrove thanks! However, something is broken with the Github actions 
setup. While this PR showed all the CI passing that was actually a red herring. 
Only a single RAT action ran. I have another open issue for getting that 
resolved [here](https://github.com/apache/arrow-datafusion-python/issues/146) I 
actually think we need to revert this commit or otherwise any further CI jobs 
will always fail until that `maturin-action` is added to the Apache whitelist 
of actions allowed to be used in this repo.
   
   Or we can just expect that to be done today and wait. Up to you but wanted 
to let you know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] rtpsw commented on pull request #33676: GH-33673: [C++] Standardize as-of-join convention for past and future tolerance

2023-01-20 Thread GitBox


rtpsw commented on PR #33676:
URL: https://github.com/apache/arrow/pull/33676#issuecomment-1398491977

   Ping @westonpace - it would be great if this can be reviewed quickly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


ozankabak commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082640117


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(

Review Comment:
   Yes, so the same logic has two drivers: One with a comparison function, one 
with `SortOptions`. We currently use the former, but also anticipate to use the 
latter in the near future (we plan a follow-up of this PR for GROUPS mode). As 
@mustafasrepo mentions, it also brings both search APIs in line, which is good 
too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] Sach1nAgarwal commented on pull request #3576: Propagate EOF Error from AsyncRead

2023-01-20 Thread GitBox


Sach1nAgarwal commented on PR #3576:
URL: https://github.com/apache/arrow-rs/pull/3576#issuecomment-1398490172

   I will try to write a test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-ballista] thinkharderdev commented on pull request #560: Cluster state refactor part 1

2023-01-20 Thread GitBox


thinkharderdev commented on PR #560:
URL: https://github.com/apache/arrow-ballista/pull/560#issuecomment-1398487737

   I'll plan on merging this tomorrow morning


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on pull request #356: feat(go/adbc/driver/pkg/cmake): cmake build for Go shared library drivers

2023-01-20 Thread GitBox


lidavidm commented on PR #356:
URL: https://github.com/apache/arrow-adbc/pull/356#issuecomment-1398487632

   Something about the Go build in the sdist is flaky, when it tries to remove 
the generated header. Maybe instead of removing it, we just add a .gitignore 
and move on?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #15223: GH-15203: [Java] Implement writing compressed files

2023-01-20 Thread GitBox


ursabot commented on PR #15223:
URL: https://github.com/apache/arrow/pull/15223#issuecomment-1398486539

   ['Python', 'R'] benchmarks have high level of regressions.
   
[test-mac-arm](https://conbench.ursa.dev/compare/runs/200a70d5895c419f9ee6659640af67d5...ad7d2fade4df48c7b3718a3d97031fd1/)
   
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/b7a03f0af28c44ef9bb2e83bfbc80ce3...c1895394d29540dcb97b7a69120716f5/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-rs] ursabot commented on pull request #3576: Propagate EOF Error from AsyncRead

2023-01-20 Thread GitBox


ursabot commented on PR #3576:
URL: https://github.com/apache/arrow-rs/pull/3576#issuecomment-1398486113

   Benchmark runs are scheduled for baseline = 
a61da1e655e76e8676f1cdb021b13551e720b0de and contender = 
a1cedb4fdfb561eda4e836a6c8fcb898d7a37029. 
a1cedb4fdfb561eda4e836a6c8fcb898d7a37029 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1569148bb880428983e9516e78f6205c...39c6792cb9314558aeca361712dcca65/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/7c0c4ca52d8a446bb64791b93131d359...05af447fe65b4023835bf20e9b67708e/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/251ba9673d434bbd989eaa890b83e518...51dd5d80b4fe48159493ad8f8fad0a0e/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/489f7aba13b14cb5a228f979a47e3bfc...00472df3e5d04a58b1819d0ff4b00bca/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ursabot commented on pull request #5001: Minor: Document how to create `ListingTables`

2023-01-20 Thread GitBox


ursabot commented on PR #5001:
URL: 
https://github.com/apache/arrow-datafusion/pull/5001#issuecomment-1398486164

   Benchmark runs are scheduled for baseline = 
e566bfc4af0ffb53717a784ab423d407473b62a0 and contender = 
6d770ad0d747e9e87752888ddd3dd69d6765. 
6d770ad0d747e9e87752888ddd3dd69d6765 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/621768b6e15c4e09a5ba21af06455f7e...c64f7dd7cbb04cbda1e80a036e23c806/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/99b304dca9d74ad58a86397457db36c8...a9cc618429c5492c8721628c4c0defc5/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a95471e9569c453e84999b23406b7407...d9c4a6f8eca74884b8c384cbb0df4474/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/192cdd73a8d14570bf3428a4474acf0e...a8d2c8b8073441b28e96126e96f658b2/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ursabot commented on pull request #5000: Allow overriding error type in DataFusion Result

2023-01-20 Thread GitBox


ursabot commented on PR #5000:
URL: 
https://github.com/apache/arrow-datafusion/pull/5000#issuecomment-1398486195

   Benchmark runs are scheduled for baseline = 
6d770ad0d747e9e87752888ddd3dd69d6765 and contender = 
5dd5ffd5ea84d843b9ef34d0eaa9ac992618f6e2. 
5dd5ffd5ea84d843b9ef34d0eaa9ac992618f6e2 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c64f7dd7cbb04cbda1e80a036e23c806...56a8bcee1cd643ce80865ba3a710a119/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/a9cc618429c5492c8721628c4c0defc5...5e2ba9d6ebf94d83bcc8944cc96a2cbf/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d9c4a6f8eca74884b8c384cbb0df4474...622391ae626e47fb978bce2975b24d3f/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a8d2c8b8073441b28e96126e96f658b2...906ec70e752e49e0bc4258521395f2ec/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] ursabot commented on pull request #4944: Only add outer filter once when transforming exists/in subquery to join

2023-01-20 Thread GitBox


ursabot commented on PR #4944:
URL: 
https://github.com/apache/arrow-datafusion/pull/4944#issuecomment-1398486136

   Benchmark runs are scheduled for baseline = 
22d106a6564345a746699cd5eb1fc84b9267ce83 and contender = 
e566bfc4af0ffb53717a784ab423d407473b62a0. 
e566bfc4af0ffb53717a784ab423d407473b62a0 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/044241bbf7bd43bbab0fd1d22d9ff5bf...621768b6e15c4e09a5ba21af06455f7e/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/f58ae74daf734067afb360384290f899...99b304dca9d74ad58a86397457db36c8/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/51ab53df86d64cc08a9bdb0f318de977...a95471e9569c453e84999b23406b7407/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/70b1345bb0a74810b6c5d6f439ad7ac2...192cdd73a8d14570bf3428a4474acf0e/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #15223: GH-15203: [Java] Implement writing compressed files

2023-01-20 Thread GitBox


ursabot commented on PR #15223:
URL: https://github.com/apache/arrow/pull/15223#issuecomment-1398486089

   Benchmark runs are scheduled for baseline = 
e4019add4189a9abe25f8ff6f12099ed19921104 and contender = 
4c698fb3c2a2b4ee046c6ad6e992e81ed90c7b0e. 
4c698fb3c2a2b4ee046c6ad6e992e81ed90c7b0e is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/07707581d15e4db5a93db947bb177690...79a041199540455e8843a59ad958c9d2/)
   [Failed :arrow_down:1.89% :arrow_up:0.03%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/200a70d5895c419f9ee6659640af67d5...ad7d2fade4df48c7b3718a3d97031fd1/)
   [Finished :arrow_down:1.53% :arrow_up:0.0%] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/b7a03f0af28c44ef9bb2e83bfbc80ce3...c1895394d29540dcb97b7a69120716f5/)
   [Finished :arrow_down:0.5% :arrow_up:0.0%] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8ee250cff604a5cbb970b9f375005a1...8f36663cfd3f44e49405b9fe83811b7c/)
   Buildkite builds:
   [Finished] [`4c698fb3` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2225)
   [Failed] [`4c698fb3` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2250)
   [Finished] [`4c698fb3` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2220)
   [Finished] [`4c698fb3` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2243)
   [Finished] [`e4019add` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2224)
   [Finished] [`e4019add` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2249)
   [Finished] [`e4019add` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2219)
   [Finished] [`e4019add` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2242)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4999: Add dictionary_expresions feature (#4386)

2023-01-20 Thread GitBox


tustvold commented on code in PR #4999:
URL: https://github.com/apache/arrow-datafusion/pull/4999#discussion_r1082628665


##
datafusion/physical-expr/Cargo.toml:
##
@@ -35,12 +35,15 @@ path = "src/lib.rs"
 [features]
 crypto_expressions = ["md-5", "sha2", "blake2", "blake3"]
 default = ["crypto_expressions", "regex_expressions", "unicode_expressions"]

Review Comment:
   I fairly strongly disagree, it is pretty esoteric. As a data point, none of 
IOx's integration tests require this, and we use dictionaries a LOT :smile: 
   
   It is important to highlight this isn't "dictionary support" but non-scalar, 
binary dictionary kernels which are pretty unusual in practice



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   4   5   6   7   8   9   10   >