[jira] [Created] (ARROW-17264) [Go] Function group by on table

2022-07-29 Thread Francisco Garcia (Jira)
Francisco Garcia created ARROW-17264:


 Summary: [Go] Function group by on table
 Key: ARROW-17264
 URL: https://issues.apache.org/jira/browse/ARROW-17264
 Project: Apache Arrow
  Issue Type: Wish
  Components: Go
Affects Versions: 8.0.1
Reporter: Francisco Garcia


I'm trying to find some way to group data in Apache Arrow with golang, but I 
couldn't do it. There's a way to do this or it is only implemented in cpp and 
python.

Are there plans to implement this on future releases?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17193) [C++] Building GCS and tests on M1 MacOS 12.05 is failing.

2022-07-29 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573201#comment-17573201
 ] 

Kouhei Sutou commented on ARROW-17193:
--

This is ready but I'm not sure whether we should cherry-pick this to 9.0.0 or 
not. (I don't opposite it.)
Generally, users don't use {{ARROW_BUILD_TESTS=ON}}. I think that this isn't 
occurred without {{ARROW_BUILD_TESTS=ON}}.

> [C++] Building GCS and tests on M1 MacOS 12.05 is failing.
> --
>
> Key: ARROW-17193
> URL: https://issues.apache.org/jira/browse/ARROW-17193
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 8.0.0
>Reporter: Rok Mihevc
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Building GCS and tests on M1 MacOS 12.05 with dependencies installed with 
> homebrew is failing.
> {code:bash}
> cmake \
>   -GNinja \
>   -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>   -DCMAKE_INSTALL_LIBDIR=lib \
>   -DARROW_PYTHON=ON \
>   -DARROW_COMPUTE=ON \
>   -DARROW_FILESYSTEM=ON \
>   -DARROW_CSV=ON \
>   -DARROW_GCS=ON \
>   -DARROW_INSTALL_NAME_RPATH=OFF \
>   -DARROW_BUILD_TESTS=ON \
>   -DCMAKE_CXX_STANDARD=17 \
>   ..
> {code}
> Env:
> {code:bash}
> PYARROW_WITH_PARQUET=1
> PYARROW_WITH_DATASET=1
> PYARROW_WITH_ORC=1
> PYARROW_WITH_PARQUET_ENCRYPTION=1
> PYARROW_WITH_PLASMA=1
> PYARROW_WITH_GCS=1
> {code}
> Building errors with:
> {noformat}
> Undefined symbols for architecture arm64:
>   "absl::lts_20220623::FormatTime(std::__1::basic_string_view std::__1::char_traits >, absl::lts_20220623::Time, 
> absl::lts_20220623::TimeZone)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   
> "absl::lts_20220623::FromChrono(std::__1::chrono::time_point  std::__1::chrono::duration > > 
> const&)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::RFC3339_full", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::time_internal::cctz::utc_time_zone()", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::ToDoubleSeconds(absl::lts_20220623::Duration)", 
> referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   "absl::lts_20220623::Duration::operator-=(absl::lts_20220623::Duration)", 
> referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   "absl::lts_20220623::ParseTime(std::__1::basic_string_view std::__1::char_traits >, std::__1::basic_string_view std::__1::char_traits >, absl::lts_20220623::Time*, 
> std::__1::basic_string, 
> std::__1::allocator >*)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
> {noformat}
> Dependencies  installed with:
> {noformat}
> brew update && brew bundle --file=cpp/Brewfile
> {noformat}
> See https://github.com/apache/arrow/pull/13681#issuecomment-1193241547 and  
> https://github.com/apache/arrow/pull/13407



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17263) [C++] Utility functions for working with RLE

2022-07-29 Thread Tobias Zagorni (Jira)
Tobias Zagorni created ARROW-17263:
--

 Summary: [C++] Utility functions for working with RLE
 Key: ARROW-17263
 URL: https://issues.apache.org/jira/browse/ARROW-17263
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17262) [C++] Kernel input type matcher for RLE

2022-07-29 Thread Tobias Zagorni (Jira)
Tobias Zagorni created ARROW-17262:
--

 Summary: [C++] Kernel input type matcher for RLE
 Key: ARROW-17262
 URL: https://issues.apache.org/jira/browse/ARROW-17262
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17259) [C++] Use shared_ptr less throughout arrow/compute

2022-07-29 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-17259:


Assignee: Wes McKinney

> [C++] Use shared_ptr less throughout arrow/compute
> 
>
> Key: ARROW-17259
> URL: https://issues.apache.org/jira/browse/ARROW-17259
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It turns out we generate a ton of code just copying and manipulating 
> {{shared_ptr}} throughput arrow/compute, and especially in the 
> configuration of the function/kernels registry. One function 
> {{RegisterScalarArithmetic}} generates around 300kb of code, which on looking 
> at disassembly contains a significant amount of inlined shared_ptr template 
> code. I made an attempt to refactoring things to use {{const DataType*}} for 
> function signatures which removes quite a bit of code bloat, and puts us on a 
> path to using fewer shared_ptr's in general



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17261) [C++] Add type ID, Type and Array classes for RLE

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17261:
---
Labels: pull-request-available  (was: )

> [C++] Add type ID, Type and Array classes for RLE
> -
>
> Key: ARROW-17261
> URL: https://issues.apache.org/jira/browse/ARROW-17261
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Tobias Zagorni
>Assignee: Tobias Zagorni
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Mostly picking these parts from ARROW-16772 and ARROW-16781 to create an 
> easier order to merge things



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17255) Support JSON logical type in Arrow

2022-07-29 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573184#comment-17573184
 ] 

Rok Mihevc commented on ARROW-17255:


This is one of the threads: 
https://lists.apache.org/thread/3nls3222ggnxlrp0s46rxrcmgbyhgn8t

> Support JSON logical type in Arrow
> --
>
> Key: ARROW-17255
> URL: https://issues.apache.org/jira/browse/ARROW-17255
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java, Python
>Reporter: Pradeep Gollakota
>Priority: Major
>
> As a BigQuery developer, I would like the Arrow libraries to support the JSON 
> logical Type. This would enable us to use the JSON type in the Arrow format 
> of our ReadAPI. This would also enable us to use the JSON type to export data 
> from BigQuery to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17261) [C++] Add type ID, Type and Array classes for RLE

2022-07-29 Thread Tobias Zagorni (Jira)
Tobias Zagorni created ARROW-17261:
--

 Summary: [C++] Add type ID, Type and Array classes for RLE
 Key: ARROW-17261
 URL: https://issues.apache.org/jira/browse/ARROW-17261
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni


Mostly picking these parts from ARROW-16772 and ARROW-16781 to create an easier 
order to merge things



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17259) [C++] Use shared_ptr less throughout arrow/compute

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17259:
---
Labels: pull-request-available  (was: )

> [C++] Use shared_ptr less throughout arrow/compute
> 
>
> Key: ARROW-17259
> URL: https://issues.apache.org/jira/browse/ARROW-17259
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It turns out we generate a ton of code just copying and manipulating 
> {{shared_ptr}} throughput arrow/compute, and especially in the 
> configuration of the function/kernels registry. One function 
> {{RegisterScalarArithmetic}} generates around 300kb of code, which on looking 
> at disassembly contains a significant amount of inlined shared_ptr template 
> code. I made an attempt to refactoring things to use {{const DataType*}} for 
> function signatures which removes quite a bit of code bloat, and puts us on a 
> path to using fewer shared_ptr's in general



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17260) [Release] Java jars verification pass despite that nothing has been uploaded

2022-07-29 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573183#comment-17573183
 ] 

Krisztian Szucs commented on ARROW-17260:
-

This is the second submission after I uploaded and closed the java release on 
the apache sonatype repo:

https://github.com/apache/arrow/pull/13749#issuecomment-129881

> [Release] Java jars verification pass despite that nothing has been uploaded
> 
>
> Key: ARROW-17260
> URL: https://issues.apache.org/jira/browse/ARROW-17260
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Priority: Major
>
> Build do pass, despite that I forgot to upload the java binaries: 
> https://github.com/ursacomputing/crossbow/runs/7587084181?check_suite_focus=true
>  
> cc [~assignUser] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17193) [C++] Building GCS and tests on M1 MacOS 12.05 is failing.

2022-07-29 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573182#comment-17573182
 ] 

Rok Mihevc commented on ARROW-17193:


I think Krisz is [probably open to 
it|https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Status.20of.20GCS.20support.3F]
 if we have a fix.
[~kou] please let me know if I can help testing or otherwise!

> [C++] Building GCS and tests on M1 MacOS 12.05 is failing.
> --
>
> Key: ARROW-17193
> URL: https://issues.apache.org/jira/browse/ARROW-17193
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 8.0.0
>Reporter: Rok Mihevc
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Building GCS and tests on M1 MacOS 12.05 with dependencies installed with 
> homebrew is failing.
> {code:bash}
> cmake \
>   -GNinja \
>   -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>   -DCMAKE_INSTALL_LIBDIR=lib \
>   -DARROW_PYTHON=ON \
>   -DARROW_COMPUTE=ON \
>   -DARROW_FILESYSTEM=ON \
>   -DARROW_CSV=ON \
>   -DARROW_GCS=ON \
>   -DARROW_INSTALL_NAME_RPATH=OFF \
>   -DARROW_BUILD_TESTS=ON \
>   -DCMAKE_CXX_STANDARD=17 \
>   ..
> {code}
> Env:
> {code:bash}
> PYARROW_WITH_PARQUET=1
> PYARROW_WITH_DATASET=1
> PYARROW_WITH_ORC=1
> PYARROW_WITH_PARQUET_ENCRYPTION=1
> PYARROW_WITH_PLASMA=1
> PYARROW_WITH_GCS=1
> {code}
> Building errors with:
> {noformat}
> Undefined symbols for architecture arm64:
>   "absl::lts_20220623::FormatTime(std::__1::basic_string_view std::__1::char_traits >, absl::lts_20220623::Time, 
> absl::lts_20220623::TimeZone)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   
> "absl::lts_20220623::FromChrono(std::__1::chrono::time_point  std::__1::chrono::duration > > 
> const&)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::RFC3339_full", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::time_internal::cctz::utc_time_zone()", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::ToDoubleSeconds(absl::lts_20220623::Duration)", 
> referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   "absl::lts_20220623::Duration::operator-=(absl::lts_20220623::Duration)", 
> referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   "absl::lts_20220623::ParseTime(std::__1::basic_string_view std::__1::char_traits >, std::__1::basic_string_view std::__1::char_traits >, absl::lts_20220623::Time*, 
> std::__1::basic_string, 
> std::__1::allocator >*)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
> {noformat}
> Dependencies  installed with:
> {noformat}
> brew update && brew bundle --file=cpp/Brewfile
> {noformat}
> See https://github.com/apache/arrow/pull/13681#issuecomment-1193241547 and  
> https://github.com/apache/arrow/pull/13407



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17260) [Release] Java jars verification pass despite that nothing has been uploaded

2022-07-29 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17260:
---

 Summary: [Release] Java jars verification pass despite that 
nothing has been uploaded
 Key: ARROW-17260
 URL: https://issues.apache.org/jira/browse/ARROW-17260
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs


Build do pass, despite that I forgot to upload the java binaries: 
https://github.com/ursacomputing/crossbow/runs/7587084181?check_suite_focus=true
 

cc [~assignUser] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17259) [C++] Use shared_ptr less throughout arrow/compute

2022-07-29 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-17259:


 Summary: [C++] Use shared_ptr less throughout 
arrow/compute
 Key: ARROW-17259
 URL: https://issues.apache.org/jira/browse/ARROW-17259
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 10.0.0


It turns out we generate a ton of code just copying and manipulating 
{{shared_ptr}} throughput arrow/compute, and especially in the 
configuration of the function/kernels registry. One function 
{{RegisterScalarArithmetic}} generates around 300kb of code, which on looking 
at disassembly contains a significant amount of inlined shared_ptr template 
code. I made an attempt to refactoring things to use {{const DataType*}} for 
function signatures which removes quite a bit of code bloat, and puts us on a 
path to using fewer shared_ptr's in general



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16929) [C++] Remove ExecBatchIterator and usages thereof

2022-07-29 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-16929.
--
Fix Version/s: 9.0.0
   Resolution: Fixed

Resolved in a related PR

> [C++] Remove ExecBatchIterator and usages thereof
> -
>
> Key: ARROW-16929
> URL: https://issues.apache.org/jira/browse/ARROW-16929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 9.0.0
>
>
> The only place left using it is in GroupBy in 
> arrow/compute/exec/aggregate.cc. This can be refactored to use ExecSpan. 
> As part of this removal, we should adapt the benchmarks for ExecSpanIterator 
> to demonstrate the performance improvement there 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17258) [C++] Separate VisitTypeInline for types that can exist as a Scalar

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17258:
---
Labels: pull-request-available  (was: )

> [C++] Separate VisitTypeInline for types that can exist as a Scalar
> ---
>
> Key: ARROW-17258
> URL: https://issues.apache.org/jira/browse/ARROW-17258
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Tobias Zagorni
>Assignee: Tobias Zagorni
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17258) [C++] Separate VisitTypeInline for types that can exist as a Scalar

2022-07-29 Thread Tobias Zagorni (Jira)
Tobias Zagorni created ARROW-17258:
--

 Summary: [C++] Separate VisitTypeInline for types that can exist 
as a Scalar
 Key: ARROW-17258
 URL: https://issues.apache.org/jira/browse/ARROW-17258
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17248) [CI][Conan] Enable Zstandard

2022-07-29 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-17248.
--
Fix Version/s: 10.0.0
   Resolution: Fixed

Issue resolved by pull request 13742
[https://github.com/apache/arrow/pull/13742]

> [CI][Conan] Enable Zstandard
> 
>
> Key: ARROW-17248
> URL: https://issues.apache.org/jira/browse/ARROW-17248
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17249) [CI][Conan] Enable bzip2

2022-07-29 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-17249.
--
Fix Version/s: 10.0.0
   Resolution: Fixed

Issue resolved by pull request 13743
[https://github.com/apache/arrow/pull/13743]

> [CI][Conan] Enable bzip2
> 
>
> Key: ARROW-17249
> URL: https://issues.apache.org/jira/browse/ARROW-17249
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Wayne Smith (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573108#comment-17573108
 ] 

Wayne Smith edited comment on ARROW-17224 at 7/29/22 9:18 PM:
--

Jacob, I concur.  And doing conda -y update conda base (or similar) beforehand 
(as suggested quite often on StackOverflow) doesn't help (and also takes a long 
time).

The first suggestion for installing r-arrow on Linux from the docs–i.e., 
upgrading directly from Rstudio (now Posit) is the fastest and works.  I just 
don't hope the link to the binaries is brittle or unreliable (you might want to 
check that too).

I've also gotten it to work with the 'nightly' version hosted on Apache.  The 
compilation is much slower than the RStudio instructions (again, now Posit) 
approach and also needs (as the doc's say) the libcurl4-openssl-dev package.  
However, my experience is that some (non-sudo) users can't install that package 
on their distro.

One more issue.  The Rstudio package pull is actually for Ubuntu 18.04, not 
Ubuntu 20.04 (or even 22.04).  It's not clear to me that is a bug or a feature 
over the long run.  And it should be documented by Rstudio.  Even it is, we 
might consider documenting that subtle change in the Arrow/Linux/R doc's too 
(just my $.02.)

Best,

Wayne

 


was (Author: JIRAUSER293451):
Jacob, I concur.  And doing conda -y update conda base (or similar) beforehand 
(as suggested quite often on StackOverflow) doesn't help (and also takes a long 
time).

The first suggestion for installing r-arrow on Linux from the docs–i.e., 
upgrading directly from Rstudio (now Posit) is the fastest and works.  I just 
don't hope the link to the binaries is brittle or unreliable (you might want to 
check that too).

I've also gotten it to work with the 'nightly' version hosted on Apache.  The 
compilation is much slower than the RStudio instructions (again, now Posit) 
approach and also needs (as the doc's say) the libcurl4-openssl-dev package.  
However, my experience is that some (non-sudo) users can't install that package 
on their distro.

Best,

Wayne

 

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17022) [C++] Add unit tests and documentation for swiss-join

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17022:
---
Labels: pull-request-available  (was: )

> [C++] Add unit tests and documentation for swiss-join 
> --
>
> Key: ARROW-17022
> URL: https://issues.apache.org/jira/browse/ARROW-17022
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The swiss join utilities being added as part of ARROW-14182 are not 
> adequately unit tested at the moment.  They have fairly decent coverage from 
> end-to-end random hash join testing.  However, a set of basic unit tests will 
> help future maintenance by demonstrating basic usage and allowing for more 
> targeted fixes when a refactor breaks something.  I'm doing some of this work 
> as I review ARROW-14182 anyways so that I can better understand it.  Rather 
> than complicate the review I will open this as a separate follow-up PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12590) [C++][R] Update copies of Homebrew files to reflect recent updates

2022-07-29 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573158#comment-17573158
 ] 

Jonathan Keane commented on ARROW-12590:


Yeah, that should work until the homer maintainers decide to pull it out

> [C++][R] Update copies of Homebrew files to reflect recent updates
> --
>
> Key: ARROW-12590
> URL: https://issues.apache.org/jira/browse/ARROW-12590
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, R
>Reporter: Ian Cook
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our copies of the Homebrew formulae at 
> [https://github.com/apache/arrow/tree/master/dev/tasks/homebrew-formulae] 
> have drifted out of sync with what's currently in 
> [https://github.com/Homebrew/homebrew-core/tree/master/Formula] and 
> [https://github.com/autobrew/homebrew-core/blob/master/Formula|https://github.com/autobrew/homebrew-core/blob/master/Formula/].
>  Get them back in sync and consider automating some method of checking that 
> they are in sync, e.g. by failing the {{homebrew-cpp}} and 
>  {{homebrew-r-autobrew}} nightly tests if our copies don't match what's in 
> the Homebrew and autobrew repos (but only if there were changes there that 
> weren't made in our repo, and not the inverse).
> Update the instructions at 
>  
> [https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingHomebrewpackages]
>  as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17193) [C++] Building GCS and tests on M1 MacOS 12.05 is failing.

2022-07-29 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573157#comment-17573157
 ] 

Ian Cook commented on ARROW-17193:
--

[~kou] [~rokm] do you think we could get the patch for this included in the 
next 9.0.0 release candidate (assuming there will be another release candidate)?

> [C++] Building GCS and tests on M1 MacOS 12.05 is failing.
> --
>
> Key: ARROW-17193
> URL: https://issues.apache.org/jira/browse/ARROW-17193
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 8.0.0
>Reporter: Rok Mihevc
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Building GCS and tests on M1 MacOS 12.05 with dependencies installed with 
> homebrew is failing.
> {code:bash}
> cmake \
>   -GNinja \
>   -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>   -DCMAKE_INSTALL_LIBDIR=lib \
>   -DARROW_PYTHON=ON \
>   -DARROW_COMPUTE=ON \
>   -DARROW_FILESYSTEM=ON \
>   -DARROW_CSV=ON \
>   -DARROW_GCS=ON \
>   -DARROW_INSTALL_NAME_RPATH=OFF \
>   -DARROW_BUILD_TESTS=ON \
>   -DCMAKE_CXX_STANDARD=17 \
>   ..
> {code}
> Env:
> {code:bash}
> PYARROW_WITH_PARQUET=1
> PYARROW_WITH_DATASET=1
> PYARROW_WITH_ORC=1
> PYARROW_WITH_PARQUET_ENCRYPTION=1
> PYARROW_WITH_PLASMA=1
> PYARROW_WITH_GCS=1
> {code}
> Building errors with:
> {noformat}
> Undefined symbols for architecture arm64:
>   "absl::lts_20220623::FormatTime(std::__1::basic_string_view std::__1::char_traits >, absl::lts_20220623::Time, 
> absl::lts_20220623::TimeZone)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   
> "absl::lts_20220623::FromChrono(std::__1::chrono::time_point  std::__1::chrono::duration > > 
> const&)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::RFC3339_full", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::time_internal::cctz::utc_time_zone()", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() 
> in gcsfs_test.cc.o
>   "absl::lts_20220623::ToDoubleSeconds(absl::lts_20220623::Duration)", 
> referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   "absl::lts_20220623::Duration::operator-=(absl::lts_20220623::Duration)", 
> referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
>   "absl::lts_20220623::ParseTime(std::__1::basic_string_view std::__1::char_traits >, std::__1::basic_string_view std::__1::char_traits >, absl::lts_20220623::Time*, 
> std::__1::basic_string, 
> std::__1::allocator >*)", referenced from:
>   arrow::fs::(anonymous 
> namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
> gcsfs_test.cc.o
> {noformat}
> Dependencies  installed with:
> {noformat}
> brew update && brew bundle --file=cpp/Brewfile
> {noformat}
> See https://github.com/apache/arrow/pull/13681#issuecomment-1193241547 and  
> https://github.com/apache/arrow/pull/13407



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17256) [Python] Can't call combine_chunks on empty ChunkedArray

2022-07-29 Thread Nicola Crane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Crane updated ARROW-17256:
-
Summary: [Python] Can't call combine_chunks on empty ChunkedArray  (was: 
Can't call combine_chunks on empty ChunkedArray)

> [Python] Can't call combine_chunks on empty ChunkedArray
> 
>
> Key: ARROW-17256
> URL: https://issues.apache.org/jira/browse/ARROW-17256
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 8.0.0
> python 3.9
>Reporter: &res
>Priority: Minor
>
> When calling:
> {code:java}
> pa.chunked_array([], type=pa.bool_()).combine_chunks(){code}
> I get this error:
> {code:java}
>  pyarrow/table.pxi:700: in pyarrow.lib.ChunkedArray.combine_chunks
>     ???
> pyarrow/array.pxi:2868: in pyarrow.lib.concat_arrays
>     ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
>     ???
> pyarrow/error.pxi:100: in pyarrow.lib.check_status
>     ???
> E   pyarrow.lib.ArrowInvalid: Must pass at least one array{code}
> While this works:
> {code:java}
> pa.chunked_array([pa.array([], pa.bool_())], type=pa.bool_()) {code}
> In the first case, it should return an empty BoolArray as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17216) [C++] Support joining tables with non-key fields as list

2022-07-29 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573143#comment-17573143
 ] 

Weston Pace commented on ARROW-17216:
-

That'd be great.  The starting point would be 
`src/arrow/compute/exec/hash_join_node.cc`.  This is where you'll find the 
check itself that is currently failing, but this is not where most of the join 
logic lives.  Fair warning: the hash-join node has been a bit of a staging 
ground for performance-critical arrow compute and so it relies on a number of 
utilities not used elsewhere.  As such, this node has a pretty high learning 
curve at the moment (though my hope is that is more diffusely spread throughout 
the engine in the future).

As of the 9.0.0 release (still pending) there are two implementations of 
hash-join.  The basic implementation (HashJoinImpl) is backed by 
std::unordered_map and can be found in src/arrow/compute/exec/hash_join.h.  A 
newer version (SwissJoin) extends HashJoinImpl and is backed by a custom hash 
map and is found in src/arrow/compute/exec/swiss_join.h.  I'd recommend testing 
and adding support to the newer version as the work required is going to be 
similar between the two.  Note that the basic version supports dictionary types 
but not the newer version (and we just fall back to the basic version if 
needed) so that is an option if the newer version proves to be trouble.

Support for types here is mostly gated by support for some of the alternate 
views/encodings used by the hash join.  One of these is a non-owning arraydata 
view called KeyColumnArray which is in src/arrow/compute/light_array.h.  This 
view does not currently supported nested data.  Note that ArraySpan is pretty 
similar (see ARROW-17257) and does support nested types (I think) so maybe it 
makes sense to tackle ARROW-17257 as part of this.

The second significant thing is RowTableImpl in 
src/arrow/compute/row/row_internal.h.  This implements a row-major encoding for 
Arrow data.  During the hash-join operation, the build data is placed into a 
table in this row-major form.  Then, during materialization, it is converted 
back to a column-major form.

On top of those two key elements there are a number of other utilities like 
ExecBatchBuilder, RowArray (which should maybe be renamed to RowTable), 
RowArrayAccessor, RowArrayMerge, the hashing utilities themselves (there are 
two versions of this too, I'm pretty sure the older implementation uses 
arrow/util/hashing.h and I know the newer version uses 
arrow/compute/exec/key_hash.h), etc.

So I would probably start by looking at the unit tests that exists for those 
utilities encodings (this reminded me that I had some unit tests I had 
forgotten to push for ARROW-17022 so I will try and get those up today) and try 
to get these utilities working with nested types.  Some of these utilities 
could probably also use some more unit tests too.  Once the utilities are 
working with nested types you can enable them for the join itself and see what 
breaks.

CC [~michalno] and [~sakras] as they are more knowledgeable in this area and 
might have some additional input / advice.

> [C++] Support joining tables with non-key fields as list
> 
>
> Key: ARROW-17216
> URL: https://issues.apache.org/jira/browse/ARROW-17216
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Jayjeet Chakraborty
>Priority: Major
>  Labels: query-engine
>
> I am trying to join 2 Arrow tables where some columns are of {{list}} 
> data type. Note that my join columns/keys are primitive data types and some 
> my non-join columns/keys are of {{{}list{}}}. But, PyArrow {{join()}} 
> cannot join such as table, although pandas can. It says
> {{ArrowInvalid: Data type list is not supported in join non-key 
> field}}
> when I execute this piece of code
> {{joined_table = table_1.join(table_2, ['k1', 'k2', 'k3'])}}
> A 
> [stackoverflow|https://stackoverflow.com/questions/73071105/listitem-float-not-supported-in-join-non-key-field]
>  response pointed out that Arrow currently cannot handle non-fixed types for 
> joins. Can this be fixed ? Or is this intentional ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17214) [C++] Implement Scalar CastTo from list types to String

2022-07-29 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-17214:
-
Summary: [C++] Implement Scalar CastTo from list types to String  (was: 
[C++] Implement Scalar CastTo from all types to String)

> [C++] Implement Scalar CastTo from list types to String
> ---
>
> Key: ARROW-17214
> URL: https://issues.apache.org/jira/browse/ARROW-17214
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: David Li
>Priority: Major
>  Labels: good-second-issue, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As reported on the mailing list: 
> https://lists.apache.org/thread/rp7vpjtt4lgtjxj35oyjyqh9b6on94jf
> Some types, including LIST, LARGE_LIST, and MAP do not implement casts. 
> Ideally we'd implement these (implement all to-string casts?) by leveraging 
> the existing cast for any formattable type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17214) [C++] Implement Scalar CastTo from all types to String

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17214:
---
Labels: good-second-issue pull-request-available  (was: good-second-issue)

> [C++] Implement Scalar CastTo from all types to String
> --
>
> Key: ARROW-17214
> URL: https://issues.apache.org/jira/browse/ARROW-17214
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: David Li
>Priority: Major
>  Labels: good-second-issue, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As reported on the mailing list: 
> https://lists.apache.org/thread/rp7vpjtt4lgtjxj35oyjyqh9b6on94jf
> Some types, including LIST, LARGE_LIST, and MAP do not implement casts. 
> Ideally we'd implement these (implement all to-string casts?) by leveraging 
> the existing cast for any formattable type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573131#comment-17573131
 ] 

Jacob Wujciak-Jens commented on ARROW-17224:


{quote} I just don't hope the link to the binaries is brittle or unreliable 
(you might want to check that too){quote}

Which link do you mean the RSPM link? 
("https://packagemanager.rstudio.com/all/__linux__/focal/latest";)? This is will 
always give you the newest version. If you want to pin a certain version you 
can check the RSPM docs on how to create a time stamped link. (But I would 
probably rather use [renv|https://rstudio.github.io/renv/index.html] or 
[conda-lock|https://anaconda.org/conda-forge/conda-lock] if you require a 
reproducible environment.)

{quote} I've also gotten it to work with the 'nightly' version hosted on 
Apache.  The compilation is much slower than the RStudio instructions{quote}

Yes while we have pre-compiled libarrow binaries and a script that detects 
which one matches you distro best, we still need to compile the actual R 
package which takes ~5 minutes. While RSPM (PPM soon :D) supplies package 
binaries that don't require any compilation. An important note in regards to 
the nightlies: these are 100% brittle as we only ever keep 14 versions/days 
around and delete everything else. So if you require reproducibility I would 
advise against using them.

I have talked to some conda-forge users and they all recommend using 
[mamba|https://github.com/mamba-org/mamba] when using conda-forge packages as 
it has a much faster solver. Something that might need to be added to the docs.

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17257) [C++] Unify KeyColumnArray and ArraySpan

2022-07-29 Thread Weston Pace (Jira)
Weston Pace created ARROW-17257:
---

 Summary: [C++] Unify KeyColumnArray and ArraySpan
 Key: ARROW-17257
 URL: https://issues.apache.org/jira/browse/ARROW-17257
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


Both of these are essentially non-owning views into ArrayData.  They were 
developed somewhat independently but share a pretty similar structure.  I don't 
think we need both and we should unify on a common type for simplicity provided 
we can show no real performance difference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17255) Support JSON logical type in Arrow

2022-07-29 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573115#comment-17573115
 ] 

David Li commented on ARROW-17255:
--

Hey - I made a guess at the components, but you may want to follow up on the 
mailing list (d...@arrow.apache.org) with some more details (e.g. what 
languages you want to support, at least initially, and any capabilities such an 
extension type would have, beyond just wrapping a string). There have been 
other such discussions on 'common' extension types like UUIDs. 

> Support JSON logical type in Arrow
> --
>
> Key: ARROW-17255
> URL: https://issues.apache.org/jira/browse/ARROW-17255
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java, Python
>Reporter: Pradeep Gollakota
>Priority: Major
>
> As a BigQuery developer, I would like the Arrow libraries to support the JSON 
> logical Type. This would enable us to use the JSON type in the Arrow format 
> of our ReadAPI. This would also enable us to use the JSON type to export data 
> from BigQuery to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17166) [R] [CI] force_tests() cannot return TRUE

2022-07-29 Thread Jonathan Keane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane resolved ARROW-17166.

Resolution: Fixed

Issue resolved by pull request 13680
[https://github.com/apache/arrow/pull/13680]

> [R] [CI] force_tests() cannot return TRUE
> -
>
> Key: ARROW-17166
> URL: https://issues.apache.org/jira/browse/ARROW-17166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Rok Mihevc
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: CI, pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Update: the OOM has cleared up so the scope of this PR changed.
> Old title: [R] [CI] Exclude large memory tests from the force-tests job on CI
> =
> We have noticed R CI job (AMD64 Ubuntu 20.04 R 4.2 Force-Tests true) failing 
> on master: 
> [1|https://github.com/apache/arrow/runs/7424773120?check_suite_focus=true#step:7:5547],
>  
> [2|https://github.com/apache/arrow/runs/7431821192?check_suite_focus=true#step:7:5804],
>  
> [3|https://github.com/apache/arrow/runs/7445803518?check_suite_focus=true#step:7:16305]
> with:
> {code:java}
> Start test: array uses local timezone for POSIXct without timezone
>   test-Array.R:269:3 [success]
> System has not been booted with systemd as init system (PID 1). Can't operate.
> Failed to create bus connection: Host is down
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15693) [Dev] Update crossbow templates to use master or main

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-15693:
---
Labels: pull-request-available  (was: )

> [Dev] Update crossbow templates to use master or main
> -
>
> Key: ARROW-15693
> URL: https://issues.apache.org/jira/browse/ARROW-15693
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Developer Tools
>Reporter: Neal Richardson
>Assignee: Kevin Gurney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17255) Support JSON logical type in Arrow

2022-07-29 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-17255:
-
Component/s: C++
 Java
 Python
 (was: Archery)

> Support JSON logical type in Arrow
> --
>
> Key: ARROW-17255
> URL: https://issues.apache.org/jira/browse/ARROW-17255
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java, Python
>Reporter: Pradeep Gollakota
>Priority: Major
>
> As a BigQuery developer, I would like the Arrow libraries to support the JSON 
> logical Type. This would enable us to use the JSON type in the Arrow format 
> of our ReadAPI. This would also enable us to use the JSON type to export data 
> from BigQuery to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-12590) [C++][R] Update copies of Homebrew files to reflect recent updates

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-12590:
--

Assignee: Jacob Wujciak-Jens

> [C++][R] Update copies of Homebrew files to reflect recent updates
> --
>
> Key: ARROW-12590
> URL: https://issues.apache.org/jira/browse/ARROW-12590
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, R
>Reporter: Ian Cook
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our copies of the Homebrew formulae at 
> [https://github.com/apache/arrow/tree/master/dev/tasks/homebrew-formulae] 
> have drifted out of sync with what's currently in 
> [https://github.com/Homebrew/homebrew-core/tree/master/Formula] and 
> [https://github.com/autobrew/homebrew-core/blob/master/Formula|https://github.com/autobrew/homebrew-core/blob/master/Formula/].
>  Get them back in sync and consider automating some method of checking that 
> they are in sync, e.g. by failing the {{homebrew-cpp}} and 
>  {{homebrew-r-autobrew}} nightly tests if our copies don't match what's in 
> the Homebrew and autobrew repos (but only if there were changes there that 
> weren't made in our repo, and not the inverse).
> Update the instructions at 
>  
> [https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingHomebrewpackages]
>  as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Wayne Smith (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573108#comment-17573108
 ] 

Wayne Smith edited comment on ARROW-17224 at 7/29/22 6:38 PM:
--

Jacob, I concur.  And doing conda -y update conda base (or similar) beforehand 
(as suggested quite often on StackOverflow) doesn't help (and also takes a long 
time).

The first suggestion for installing r-arrow on Linux from the docs–i.e., 
upgrading directly from Rstudio (now Posit) is the fastest and works.  I just 
don't hope the link to the binaries is brittle or unreliable (you might want to 
check that too).

I've also gotten it to work with the 'nightly' version hosted on Apache.  The 
compilation is much slower than the RStudio instructions (again, now Posit) 
approach and also needs (as the doc's say) the libcurl4-openssl-dev package.  
However, my experience is that some (non-sudo) users can't install that package 
on their distro.

Best,

Wayne

 


was (Author: JIRAUSER293451):
Jacob, I concur.  And doing conda -y update conda base (or similar) beforehand 
(as suggested quite often on StackOverflow) doesn't help (and also takes a long 
time).

The first suggestion for installing r-arrow on Linux from the docs–i.e., 
upgrading directly from Rstudio (now Posit) is the fastest and works.  I just 
don't hope the link is brittle or unreliable.

I've also gotten it to work with the 'nightly' version hosted on Apache.  The 
compilation is much slower than the RStudio instructions (now Posit) approach 
and also needs (as the doc's say) the libcurl-openssl-dev package.  However, my 
experience is that some (non-sudo) users can't install that

Wayne

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14802) [R] [CI] Illegal opcode when installing via autobrew

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens closed ARROW-14802.
--
Resolution: Duplicate

> [R] [CI] Illegal opcode when installing via autobrew
> 
>
> Key: ARROW-14802
> URL: https://issues.apache.org/jira/browse/ARROW-14802
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Priority: Major
>
> https://github.com/ursacomputing/crossbow/runs/4295761494?check_suite_focus=true#step:7:664
> {code}
> > if (identical(tolower(Sys.getenv("ARROW_R_DEV", "false")), "true")) {
> +   arrow_reporter <- MultiReporter$new(list(CheckReporter$new(), 
> LocationReporter$new()))
> + } else {
> +   arrow_reporter <- check_reporter()
> + }
> > test_check("arrow", reporter = arrow_reporter)
>  *** caught illegal operation ***
> address 0x106462630, cause 'illegal opcode'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Wayne Smith (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573108#comment-17573108
 ] 

Wayne Smith commented on ARROW-17224:
-

Jacob, I concur.  And doing conda -y update conda base (or similar) beforehand 
(as suggested quite often on StackOverflow) doesn't help (and also takes a long 
time).

The first suggestion for installing r-arrow on Linux from the docs–i.e., 
upgrading directly from Rstudio (now Posit) is the fastest and works.  I just 
don't hope the link is brittle or unreliable.

I've also gotten it to work with the 'nightly' version hosted on Apache.  The 
compilation is much slower than the RStudio instructions (now Posit) approach 
and also needs (as the doc's say) the libcurl-openssl-dev package.  However, my 
experience is that some (non-sudo) users can't install that

Wayne

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-5890) [C++][Python] Support ExtensionType arrays in more kernels

2022-07-29 Thread Clark Zinzow (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573106#comment-17573106
 ] 

Clark Zinzow commented on ARROW-5890:
-

Does allowing extension type implementers to register a cast function sound 
reasonable? I might be able to take a stab at this (just casting) in the coming 
months.

> [C++][Python] Support ExtensionType arrays in more kernels
> --
>
> Key: ARROW-5890
> URL: https://issues.apache.org/jira/browse/ARROW-5890
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Priority: Major
>
> From a quick test (through Python), it seems that {{slice}} and {{take}} 
> work, but the following not:
> - {{cast}}: it could rely on the casting rules for the storage type. Or do we 
> want that you explicitly have to take the storage array before casting?
> - {{dictionary_encode}} / {{unique}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15481) [R] [CI] Add a crossbow job that mimics CRAN's old macOS

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573107#comment-17573107
 ] 

Jacob Wujciak-Jens commented on ARROW-15481:


Working on getting self-hosted 10.13 runners matching CRAN with r-release and 
r-oldrel for nightlies and other jobs (as-cran?) open for suggestions.

> [R] [CI] Add a crossbow job that mimics CRAN's old macOS
> 
>
> Key: ARROW-15481
> URL: https://issues.apache.org/jira/browse/ARROW-15481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>
> Jeroen's autobrew does this using travis:
> https://github.com/autobrew/homebrew-core/blob/high-sierra/.travis.yml
> It would be good to test this on our own before the release process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15481) [R] [CI] Add a crossbow job that mimics CRAN's old macOS

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-15481:
--

Assignee: Jacob Wujciak-Jens

> [R] [CI] Add a crossbow job that mimics CRAN's old macOS
> 
>
> Key: ARROW-15481
> URL: https://issues.apache.org/jira/browse/ARROW-15481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>
> Jeroen's autobrew does this using travis:
> https://github.com/autobrew/homebrew-core/blob/high-sierra/.travis.yml
> It would be good to test this on our own before the release process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15481) [R] [CI] Add a crossbow job that mimics CRAN's old macOS

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15481:
---
Priority: Critical  (was: Major)

> [R] [CI] Add a crossbow job that mimics CRAN's old macOS
> 
>
> Key: ARROW-15481
> URL: https://issues.apache.org/jira/browse/ARROW-15481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Priority: Critical
>
> Jeroen's autobrew does this using travis:
> https://github.com/autobrew/homebrew-core/blob/high-sierra/.travis.yml
> It would be good to test this on our own before the release process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573105#comment-17573105
 ] 

Jacob Wujciak-Jens commented on ARROW-17224:


Hello thanks for the ticket. I have replicated this on ubuntu 20.04 and while 
it does solve the environment at some point it takes a very long time (>1h), 
which is of course not acceptable. I don't know why this happens but will look 
into it, even if there is a fix we probably want to update the docs...

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17256) Can't call combine_chunks on empty ChunkedArray

2022-07-29 Thread &res (Jira)
&res created ARROW-17256:


 Summary: Can't call combine_chunks on empty ChunkedArray
 Key: ARROW-17256
 URL: https://issues.apache.org/jira/browse/ARROW-17256
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
 Environment: pyarrow 8.0.0
python 3.9
Reporter: &res


When calling:
{code:java}
pa.chunked_array([], type=pa.bool_()).combine_chunks(){code}
I get this error:
{code:java}
 pyarrow/table.pxi:700: in pyarrow.lib.ChunkedArray.combine_chunks
    ???
pyarrow/array.pxi:2868: in pyarrow.lib.concat_arrays
    ???
pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
    ???
pyarrow/error.pxi:100: in pyarrow.lib.check_status
    ???
E   pyarrow.lib.ArrowInvalid: Must pass at least one array{code}
While this works:
{code:java}
pa.chunked_array([pa.array([], pa.bool_())], type=pa.bool_()) {code}
In the first case, it should return an empty BoolArray as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17255) Support JSON logical type in Arrow

2022-07-29 Thread Pradeep Gollakota (Jira)
Pradeep Gollakota created ARROW-17255:
-

 Summary: Support JSON logical type in Arrow
 Key: ARROW-17255
 URL: https://issues.apache.org/jira/browse/ARROW-17255
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Pradeep Gollakota


As a BigQuery developer, I would like the Arrow libraries to support the JSON 
logical Type. This would enable us to use the JSON type in the Arrow format of 
our ReadAPI. This would also enable us to use the JSON type to export data from 
BigQuery to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17252) [R] Intermittent valgrind failure

2022-07-29 Thread Dewey Dunnington (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573068#comment-17573068
 ] 

Dewey Dunnington edited comment on ARROW-17252 at 7/29/22 5:17 PM:
---

I can get a similar leak locally, too using a dockerfile:

{noformat}
FROM ubuntu:20.04
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=America/Halifax
RUN apt-get update && apt-get install -y valgrind r-base cmake git libxml2-dev 
libcurl4-openssl-dev libssl-dev libgit2-dev libfontconfig1-dev libfreetype6-dev 
libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev
RUN git clone https://github.com/apache/arrow.git /arrow && mkdir /arrow-build 
&& cd /arrow-build && cmake /arrow/cpp -DARROW_CSV=ON -DARROW_FILESYSTEM=ON 
-DARROW_COMPUTE=ON -DBoost_SOURCE=BUNDLED && cmake --build . && cmake --install 
. --prefix /arrow-dist
RUN R -e 'install.packages(c("devtools", "cpp11", "R6", "assertthat", "bit64", 
"bit", "cli", "ellipsis", "glue", "magrittr", "purrr", "rlang", "tidyselect", 
"vctrs", "lubridate", "dplyr", "hms"), repos = "https://cloud.r-project.org";)'
ENV ARROW_HOME /arrow-dist
ENV LD_LIBRARY_PATH /arrow-dist/lib
RUN cd /arrow/r && R CMD INSTALL .
{noformat}

Launching R with valgrind:

{noformat}
R -d "valgrind --tool=memcheck --leak-check=full"
{noformat}

...and I get this leak:


{noformat}
==387== 2,608 (72 direct, 2,536 indirect) bytes in 1 blocks are definitely lost 
in loss record 625 of 4,108
==387==at 0x484A3C4: operator new(unsigned long) (in 
/usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==387==by 0x1566648F: 
arrow::Table::FromRecordBatches(std::shared_ptr, 
std::vector, 
std::allocator > > const&) (in 
/arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x15629FB7: arrow::RecordBatchReader::ToTable() (in 
/arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x1501C503: operator() (compute-exec.cpp:147)
==387==by 0x1501C503: 
std::_Function_handler > (), 
ExecPlan_read_table(std::shared_ptr const&, 
std::shared_ptr const&, cpp11::r_vector, 
cpp11::r_vector, 
long)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:286)
==387==by 0x15023427: 
std::function > ()>::operator()() 
const (std_function.h:688)
==387==by 0x1502352F: 
operator() >()>&> 
(future.h:150)
==387==by 0x1502352F: __invoke_impl >&, 
std::function >()>&> (invoke.h:60)
==387==by 0x1502352F: __invoke >&, 
std::function >()>&> (invoke.h:95)
==387==by 0x1502352F: __call (functional:400)
==387==by 0x1502352F: operator()<> (functional:484)
==387==by 0x1502352F: arrow::internal::FnOnce::FnImpl >, 
std::function > ()>)> >::invoke() 
(functional.h:152)
==387==by 0x1579636B: 
std::thread::_State_impl
 > >::_M_run() (in /arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x71F4FAB: ??? (in /usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.28)
==387==by 0x55F1623: start_thread (pthread_create.c:477)
==387==by 0x4DA949B: thread_start (clone.S:78)
{noformat}

(Although this dockerfile doesn't use r-devel...it uses R 3.6 which is a bit 
old).


was (Author: paleolimbot):
I can get a similar leak locally, too using a dockerfile:

{noformat}
FROM ubuntu:20.04
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=America/Halifax
RUN apt-get update && apt-get install -y valgrind r-base cmake git libxml2-dev 
libcurl4-openssl-dev libssl-dev libgit2-dev libfontconfig1-dev libfreetype6-dev 
libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev
RUN git clone https://github.com/apache/arrow.git /arrow && mkdir /arrow-build 
&& cd /arrow-build && cmake /arrow/cpp -DARROW_CSV=ON -DARROW_DATASET=ON 
-DARROW_FILESYSTEM=ON -DARROW_COMPUTE=ON -DBoost_SOURCE=BUNDLED && cmake 
--build . && cmake --install . --prefix /arrow-dist
RUN R -e 'install.packages(c("devtools", "cpp11", "R6", "assertthat", "bit64", 
"bit", "cli", "ellipsis", "glue", "magrittr", "purrr", "rlang", "tidyselect", 
"vctrs", "lubridate", "dplyr", "hms"), repos = "https://cloud.r-project.org";)'
ENV ARROW_HOME /arrow-dist
ENV LD_LIBRARY_PATH /arrow-dist/lib
RUN cd /arrow/r && R CMD INSTALL .
{noformat}

Launching R with valgrind:

{noformat}
R -d "valgrind --tool=memcheck --leak-check=full"
{noformat}

...and I get this leak:


{noformat}
==387== 2,608 (72 direct, 2,536 indirect) bytes in 1 blocks are definitely lost 
in loss record 625 of 4,108
==387==at 0x484A3C4: operator new(unsigned long) (in 
/usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==387==by 0x1566648F: 
arrow::Table::FromRecordBatches(std::shared_ptr, 
std::vector, 
std::allocator > > const&) (in 
/arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x15629FB7: arrow::RecordBatchReader::ToTable() (in 
/arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x1501C503: operator() (compute-exec.cpp:147)
==387==by 0x1501C503: 
std::_Function_handler > (), 
ExecPlan_read_table(std::shared_ptr const&, 
std::s

[jira] [Commented] (ARROW-17252) [R] Intermittent valgrind failure

2022-07-29 Thread Dewey Dunnington (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573068#comment-17573068
 ] 

Dewey Dunnington commented on ARROW-17252:
--

I can get a similar leak locally, too using a dockerfile:

{noformat}
FROM ubuntu:20.04
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=America/Halifax
RUN apt-get update && apt-get install -y valgrind r-base cmake git libxml2-dev 
libcurl4-openssl-dev libssl-dev libgit2-dev libfontconfig1-dev libfreetype6-dev 
libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev
RUN git clone https://github.com/apache/arrow.git /arrow && mkdir /arrow-build 
&& cd /arrow-build && cmake /arrow/cpp -DARROW_CSV=ON -DARROW_DATASET=ON 
-DARROW_FILESYSTEM=ON -DARROW_COMPUTE=ON -DBoost_SOURCE=BUNDLED && cmake 
--build . && cmake --install . --prefix /arrow-dist
RUN R -e 'install.packages(c("devtools", "cpp11", "R6", "assertthat", "bit64", 
"bit", "cli", "ellipsis", "glue", "magrittr", "purrr", "rlang", "tidyselect", 
"vctrs", "lubridate", "dplyr", "hms"), repos = "https://cloud.r-project.org";)'
ENV ARROW_HOME /arrow-dist
ENV LD_LIBRARY_PATH /arrow-dist/lib
RUN cd /arrow/r && R CMD INSTALL .
{noformat}

Launching R with valgrind:

{noformat}
R -d "valgrind --tool=memcheck --leak-check=full"
{noformat}

...and I get this leak:


{noformat}
==387== 2,608 (72 direct, 2,536 indirect) bytes in 1 blocks are definitely lost 
in loss record 625 of 4,108
==387==at 0x484A3C4: operator new(unsigned long) (in 
/usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==387==by 0x1566648F: 
arrow::Table::FromRecordBatches(std::shared_ptr, 
std::vector, 
std::allocator > > const&) (in 
/arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x15629FB7: arrow::RecordBatchReader::ToTable() (in 
/arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x1501C503: operator() (compute-exec.cpp:147)
==387==by 0x1501C503: 
std::_Function_handler > (), 
ExecPlan_read_table(std::shared_ptr const&, 
std::shared_ptr const&, cpp11::r_vector, 
cpp11::r_vector, 
long)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:286)
==387==by 0x15023427: 
std::function > ()>::operator()() 
const (std_function.h:688)
==387==by 0x1502352F: 
operator() >()>&> 
(future.h:150)
==387==by 0x1502352F: __invoke_impl >&, 
std::function >()>&> (invoke.h:60)
==387==by 0x1502352F: __invoke >&, 
std::function >()>&> (invoke.h:95)
==387==by 0x1502352F: __call (functional:400)
==387==by 0x1502352F: operator()<> (functional:484)
==387==by 0x1502352F: arrow::internal::FnOnce::FnImpl >, 
std::function > ()>)> >::invoke() 
(functional.h:152)
==387==by 0x1579636B: 
std::thread::_State_impl
 > >::_M_run() (in /arrow-dist/lib/libarrow.so.900.0.0)
==387==by 0x71F4FAB: ??? (in /usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.28)
==387==by 0x55F1623: start_thread (pthread_create.c:477)
==387==by 0x4DA949B: thread_start (clone.S:78)
{noformat}

(Although this dockerfile doesn't use r-devel...it uses R 3.6 which is a bit 
old).

> [R] Intermittent valgrind failure
> -
>
> Key: ARROW-17252
> URL: https://issues.apache.org/jira/browse/ARROW-17252
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dewey Dunnington
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A number of recent nightly builds have intermittent failures with valgrind, 
> which fails because of possibly leaked memory around an exec plan. This seems 
> related to a change in XXX that separated {{ExecPlan_prepare()}} from 
> {{ExecPlan_run()}} and added a {{ExecPlan_read_table()}} that uses 
> {{RunWithCapturedR()}}. The reported leaks vary but include ExecPlans and 
> ExecNodes and fields of those objects.
> A failed run: 
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980
> Some example output:
> {noformat}
> ==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are 
> definitely lost in loss record 1,988 of 3,883
> ==5249==at 0x4849013: operator new(unsigned long) (in 
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==by 0x10B2902B: 
> std::_Function_handler 
> (arrow::compute::ExecPlan*, std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&), 
> arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*,
>  std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, 
> std::vector std::allocator >&&, 
> arrow::compute::ExecNodeOptions const&) (exec_plan.h:60)
> ==5249==   

[jira] [Updated] (ARROW-17067) Implement Substring_Index

2022-07-29 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17067:

Fix Version/s: (was: 9.0.0)

> Implement Substring_Index
> -
>
> Key: ARROW-17067
> URL: https://issues.apache.org/jira/browse/ARROW-17067
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Adding Substring_index Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17216) [C++] Support joining tables with non-key fields as list

2022-07-29 Thread Carlos Maltzahn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573057#comment-17573057
 ] 

Carlos Maltzahn commented on ARROW-17216:
-

[~heyjc] and I are willing to help implement support for joining tables with 
lists in non-key values. But we might need some help on where to start.

> [C++] Support joining tables with non-key fields as list
> 
>
> Key: ARROW-17216
> URL: https://issues.apache.org/jira/browse/ARROW-17216
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Jayjeet Chakraborty
>Priority: Major
>  Labels: query-engine
>
> I am trying to join 2 Arrow tables where some columns are of {{list}} 
> data type. Note that my join columns/keys are primitive data types and some 
> my non-join columns/keys are of {{{}list{}}}. But, PyArrow {{join()}} 
> cannot join such as table, although pandas can. It says
> {{ArrowInvalid: Data type list is not supported in join non-key 
> field}}
> when I execute this piece of code
> {{joined_table = table_1.join(table_2, ['k1', 'k2', 'k3'])}}
> A 
> [stackoverflow|https://stackoverflow.com/questions/73071105/listitem-float-not-supported-in-join-non-key-field]
>  response pointed out that Arrow currently cannot handle non-fixed types for 
> joins. Can this be fixed ? Or is this intentional ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17067) Implement Substring_Index

2022-07-29 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17067:

Fix Version/s: 9.0.0

> Implement Substring_Index
> -
>
> Key: ARROW-17067
> URL: https://issues.apache.org/jira/browse/ARROW-17067
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Adding Substring_index Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17246) [Packaging][deb][RPM] Don't use system jemalloc

2022-07-29 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17246.
-
Resolution: Fixed

Issue resolved by pull request 13739
[https://github.com/apache/arrow/pull/13739]

> [Packaging][deb][RPM] Don't use system jemalloc
> ---
>
> Key: ARROW-17246
> URL: https://issues.apache.org/jira/browse/ARROW-17246
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Because system jemalloc can't be used with {{dlopen()}}. If system jemalloc 
> can't used with {{dlopen()}}, our shared libraried can't be loaded as 
> bindings of script languages such as Ruby:
> {noformat}
> + ruby -r gi -e 'p GI.load('\''Arrow'\'')'
> (null)-WARNING **: Failed to load shared library 'libarrow-glib.so.900' 
> referenced by the typelib: /lib64/libjemalloc.so.2: cannot allocate memory in 
> static TLS block
> {noformat}
> This is caused because system jemalloc isn't built with 
> {{--disable-initial-exec-tls}}. See also:
> * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951704
> * https://github.com/jemalloc/jemalloc/issues/1237



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17254) [C++][FlightRPC] Flight SQL server does not implement GetSchema

2022-07-29 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-17254:
-
Issue Type: Bug  (was: Improvement)

> [C++][FlightRPC] Flight SQL server does not implement GetSchema
> ---
>
> Key: ARROW-17254
> URL: https://issues.apache.org/jira/browse/ARROW-17254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: David Li
>Priority: Major
>
> This is specified, but not actually implemented!
> It needs to be covered in integration tests, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17254) [C++][FlightRPC] Flight SQL server does not implement GetSchema

2022-07-29 Thread David Li (Jira)
David Li created ARROW-17254:


 Summary: [C++][FlightRPC] Flight SQL server does not implement 
GetSchema
 Key: ARROW-17254
 URL: https://issues.apache.org/jira/browse/ARROW-17254
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


This is specified, but not actually implemented!

It needs to be covered in integration tests, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17219) [Go] [IPC] Endianness Conversion for Non-native endianness

2022-07-29 Thread Matthew Topol (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Topol resolved ARROW-17219.
---
Resolution: Fixed

Issue resolved by pull request 13716
[https://github.com/apache/arrow/pull/13716]

> [Go] [IPC] Endianness Conversion for Non-native endianness
> --
>
> Key: ARROW-17219
> URL: https://issues.apache.org/jira/browse/ARROW-17219
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Go, Integration
>Reporter: Matthew Topol
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-15733) array.String offsets int32 overflow

2022-07-29 Thread Matthew Topol (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Topol resolved ARROW-15733.
---
Fix Version/s: 10.0.0
 Assignee: Matthew Topol
   Resolution: Resolved

Implementation for LargeBinary also implemented LargeString allowing int64 
offsets for String arrays

> array.String offsets int32 overflow
> ---
>
> Key: ARROW-15733
> URL: https://issues.apache.org/jira/browse/ARROW-15733
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Affects Versions: 7.0.0
>Reporter: Andrew Strelsky
>Assignee: Matthew Topol
>Priority: Minor
> Fix For: 10.0.0
>
>
> {panel}
> panic: runtime error: slice bounds out of range [:-1352393031]
> goroutine 1 [running]:
> github.com/apache/arrow/go/v7/arrow/array.(*String).ValueBytes(...)
>         
> C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/array/string.go:74
> github.com/apache/arrow/go/v7/arrow/ipc.(*recordEncoder).visit(0xc193b85c80, 
> 0xc193b9e060, \{0x10b5490, 0xc50820})
>         
> C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/writer.go:435
>  +0x2194
> github.com/apache/arrow/go/v7/arrow/ipc.(*recordEncoder).visit(0xc193b85c80, 
> 0xc193b9e060, \{0x10b5288, 0xc50730})
>         
> C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/writer.go:533
>  +0x1431
> github.com/apache/arrow/go/v7/arrow/ipc.(*recordEncoder).Encode(0xc193b85c80, 
> 0xc193b9e060, \{0x10b5838, 0xc193b8bc80})
>         
> C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/writer.go:267
>  +0x98
> github.com/apache/arrow/go/v7/arrow/ipc.(*FileWriter).Write(0xc4e480, 
> \{0x10b5838, 0xc193b8bc80})
>         
> C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/file_writer.go:342
>  +0x20d
> main.main()
> {panel}
> I have *a lot* of strings. The offsets should not only be unsigned but should 
> also be larger than 4 bytes. Changing the offsets to a slice of uint32 was 
> sufficient in my case but may not be for others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17253) Pyarrow array crashes the interpreter when encounter 0 division error

2022-07-29 Thread Li Jin (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573021#comment-17573021
 ] 

Li Jin edited comment on ARROW-17253 at 7/29/22 3:02 PM:
-

I think in general, any exception raised by the generator would crash the 
python interpreter when passed to pa.array


was (Author: icexelloss):
I think in general, any exception raised by the generator would crash the 
python interpreter when passing to pa.array

> Pyarrow array crashes the interpreter when encounter 0 division error  
> ---
>
> Key: ARROW-17253
> URL: https://issues.apache.org/jira/browse/ARROW-17253
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Priority: Major
>
> {code:java}
> pa.array((1 // 0 for x in range(10)), size=10){code}
> This would crash the python interpreter 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17253) Pyarrow array crashes the interpreter when encounter 0 division error

2022-07-29 Thread Li Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-17253:
---
Description: 
{code:java}
pa.array((1 // 0 for x in range(10)), size=10){code}
This would crash the python interpreter 

  was:
{code:java}
pa.array(1 // 0 for x in range(10), size=10){code}
This would crash the python interpreter 


> Pyarrow array crashes the interpreter when encounter 0 division error  
> ---
>
> Key: ARROW-17253
> URL: https://issues.apache.org/jira/browse/ARROW-17253
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Priority: Major
>
> {code:java}
> pa.array((1 // 0 for x in range(10)), size=10){code}
> This would crash the python interpreter 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17253) Pyarrow array crashes the interpreter when encounter 0 division error

2022-07-29 Thread Li Jin (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573021#comment-17573021
 ] 

Li Jin commented on ARROW-17253:


I think in general, any exception raised by the generator would crash the 
python interpreter when passing to pa.array

> Pyarrow array crashes the interpreter when encounter 0 division error  
> ---
>
> Key: ARROW-17253
> URL: https://issues.apache.org/jira/browse/ARROW-17253
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Priority: Major
>
> {code:java}
> pa.array(1 // 0 for x in range(10), size=10){code}
> This would crash the python interpreter 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17253) Pyarrow array crashes the interpreter when encounter 0 division error

2022-07-29 Thread Li Jin (Jira)
Li Jin created ARROW-17253:
--

 Summary: Pyarrow array crashes the interpreter when encounter 0 
division error  
 Key: ARROW-17253
 URL: https://issues.apache.org/jira/browse/ARROW-17253
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Li Jin


{code:java}
pa.array(1 // 0 for x in range(10), size=10){code}
This would crash the python interpreter 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12590) [C++][R] Update copies of Homebrew files to reflect recent updates

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573006#comment-17573006
 ] 

Jacob Wujciak-Jens commented on ARROW-12590:


Ok.

Though I thought that for [ARROW-15678]  we had a workaround (setting -O2) in 
place that should prevent the segfault? 

> [C++][R] Update copies of Homebrew files to reflect recent updates
> --
>
> Key: ARROW-12590
> URL: https://issues.apache.org/jira/browse/ARROW-12590
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, R
>Reporter: Ian Cook
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our copies of the Homebrew formulae at 
> [https://github.com/apache/arrow/tree/master/dev/tasks/homebrew-formulae] 
> have drifted out of sync with what's currently in 
> [https://github.com/Homebrew/homebrew-core/tree/master/Formula] and 
> [https://github.com/autobrew/homebrew-core/blob/master/Formula|https://github.com/autobrew/homebrew-core/blob/master/Formula/].
>  Get them back in sync and consider automating some method of checking that 
> they are in sync, e.g. by failing the {{homebrew-cpp}} and 
>  {{homebrew-r-autobrew}} nightly tests if our copies don't match what's in 
> the Homebrew and autobrew repos (but only if there were changes there that 
> weren't made in our repo, and not the inverse).
> Update the instructions at 
>  
> [https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingHomebrewpackages]
>  as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12590) [C++][R] Update copies of Homebrew files to reflect recent updates

2022-07-29 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573003#comment-17573003
 ] 

Jonathan Keane commented on ARROW-12590:


Agreed with syncing (and the original intent of this ticket was basically to 
find a way to detect if and when this happens in order to alert us about it). 
It is ok that the autobrew and the homebrew formulae are different (since in 
the newest versions of the autobrew setup, if we are on a modern enough system 
we _just use brew_).

If I'm remembering correctly, 
https://github.com/apache/arrow/pull/12157/files#diff-4b112dbca2ece7c78e15eb8aff3218e21dd6f4b1fab7cfc9182830488f68ca58R22-R30
 was basically the operative code that fixes this. If I were you, I would take 
the commits on my branch there and create a new branch and push forward with 
that since it will let you run it in CI. Though the R tests will probably 
segfault with the simd issue in ARROW-15678. Maybe that's fine (since it's 
"only" a limited number of computers that this happens on — just so happens the 
GH runners are one of those, apparently) or maybe we'll need to actually 
resolve ARROW-15678? 

> [C++][R] Update copies of Homebrew files to reflect recent updates
> --
>
> Key: ARROW-12590
> URL: https://issues.apache.org/jira/browse/ARROW-12590
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, R
>Reporter: Ian Cook
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our copies of the Homebrew formulae at 
> [https://github.com/apache/arrow/tree/master/dev/tasks/homebrew-formulae] 
> have drifted out of sync with what's currently in 
> [https://github.com/Homebrew/homebrew-core/tree/master/Formula] and 
> [https://github.com/autobrew/homebrew-core/blob/master/Formula|https://github.com/autobrew/homebrew-core/blob/master/Formula/].
>  Get them back in sync and consider automating some method of checking that 
> they are in sync, e.g. by failing the {{homebrew-cpp}} and 
>  {{homebrew-r-autobrew}} nightly tests if our copies don't match what's in 
> the Homebrew and autobrew repos (but only if there were changes there that 
> weren't made in our repo, and not the inverse).
> Update the instructions at 
>  
> [https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingHomebrewpackages]
>  as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-8226) [Go] Add binary builder that uses 64 bit offsets and make binary builders resettable

2022-07-29 Thread Matthew Topol (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Topol resolved ARROW-8226.
--
Fix Version/s: 10.0.0
   Resolution: Fixed

Issue resolved by pull request 13719
[https://github.com/apache/arrow/pull/13719]

> [Go] Add binary builder that uses 64 bit offsets and make binary builders 
> resettable
> 
>
> Key: ARROW-8226
> URL: https://issues.apache.org/jira/browse/ARROW-8226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Go
>Reporter: Richard
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I ran into some overflow issues with the existing 32 bit binary builder. My 
> changes add a new binary builder that uses 64-bit offsets + tests.
> I also added a panic for when the 32-bit offset binary builder overflows.
> Finally I made both binary builders resettable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-17224:

Component/s: R

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17224) [R][Doc] minor error in Linux installation documentation ('conda' option) for R on CRAN

2022-07-29 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-17224:

Summary: [R][Doc] minor error in Linux installation documentation ('conda' 
option) for R on CRAN  (was: minor error in Linux installation documentation 
('conda' option) for R on CRAN)

> [R][Doc] minor error in Linux installation documentation ('conda' option) for 
> R on CRAN
> ---
>
> Key: ARROW-17224
> URL: https://issues.apache.org/jira/browse/ARROW-17224
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 8.0.1
> Environment: Ubuntu 20.04
>Reporter: Wayne Smith
>Priority: Minor
> Fix For: 8.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The documentation for the Linux installation for the r-arrow binary for R is 
> at:
>     https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> The documentation indicates that the 'conda' installation syntax should be:
> {{}}
> {code:java}
> conda install -c conda-forge --strict-channel-priority r-arrow{code}
> {{}}
> I can't get that to work.  What works for me is:
> {code:java}
> conda config --set channel_priority strict
> conda install -c conda-forge r-arrow{code}
> I'm wondering if the syntax presented in the documentation is either 
> deprecated or incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17252) [R] Intermittent valgrind failure

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17252:
---
Labels: pull-request-available  (was: )

> [R] Intermittent valgrind failure
> -
>
> Key: ARROW-17252
> URL: https://issues.apache.org/jira/browse/ARROW-17252
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dewey Dunnington
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A number of recent nightly builds have intermittent failures with valgrind, 
> which fails because of possibly leaked memory around an exec plan. This seems 
> related to a change in XXX that separated {{ExecPlan_prepare()}} from 
> {{ExecPlan_run()}} and added a {{ExecPlan_read_table()}} that uses 
> {{RunWithCapturedR()}}. The reported leaks vary but include ExecPlans and 
> ExecNodes and fields of those objects.
> A failed run: 
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980
> Some example output:
> {noformat}
> ==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are 
> definitely lost in loss record 1,988 of 3,883
> ==5249==at 0x4849013: operator new(unsigned long) (in 
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==by 0x10B2902B: 
> std::_Function_handler 
> (arrow::compute::ExecPlan*, std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&), 
> arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*,
>  std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, 
> std::vector std::allocator >&&, 
> arrow::compute::ExecNodeOptions const&) (exec_plan.h:60)
> ==5249==by 0xFA83A0C: 
> std::function 
> (arrow::compute::ExecPlan*, std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&)>::operator()(arrow::compute::ExecPlan*, 
> std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&) const (std_function.h:622)
> ==5249== 14,528 (160 direct, 14,368 indirect) bytes in 1 blocks are 
> definitely lost in loss record 1,989 of 3,883
> ==5249==at 0x4849013: operator new(unsigned long) (in 
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==by 0x10096CB7: arrow::FutureImpl::Make() (future.cc:187)
> ==5249==by 0xFCB6F9A: arrow::Future::Make() 
> (future.h:420)
> ==5249==by 0x101AE927: ExecPlanImpl (exec_plan.cc:50)
> ==5249==by 0x101AE927: 
> arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, 
> std::shared_ptr) (exec_plan.cc:355)
> ==5249==by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
> ==5249==by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
> ==5249==by 0x4953B60: R_doDotCall (dotcode.c:601)
> ==5249==by 0x49C2C16: bcEval (eval.c:7682)
> ==5249==by 0x499DB95: Rf_eval (eval.c:748)
> ==5249==by 0x49A0904: R_execClosure (eval.c:1918)
> ==5249==by 0x49A05B7: Rf_applyClosure (eval.c:1844)
> ==5249==by 0x49B2122: bcEval (eval.c:7094)
> ==5249== 
> ==5249== 36,322 (416 direct, 35,906 indirect) bytes in 1 blocks are 
> definitely lost in loss record 2,929 of 3,883
> ==5249==at 0x4849013: operator new(unsigned long) (in 
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==by 0x10214F92: arrow::compute::TaskScheduler::Make() 
> (task_util.cc:421)
> ==5249==by 0x101AEA6C: ExecPlanImpl (exec_plan.cc:50)
> ==5249==by 0x101AEA6C: 
> arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, 
> std::shared_ptr) (exec_plan.cc:355)
> ==5249==by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
> ==5249==by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
> ==5249==by 0x4953B60: R_doDotCall (dotcode.c:601)
> ==5249==by 0x49C2C16: bcEval (eval.c:7682)
> ==5249==by 0x499DB95: Rf_eval (eval.c:748)
> ==5249==by 0x49A0904: R_execClosure (eval.c:1918)
> ==5249==by 0x49A05B7: Rf_applyClosure (eval.c:1844)
> ==5249==by 0x49B2122: bcEval (eval.c:7094)
> ==5249==by 0x499DB95: Rf_eval (eval.c:748)
> {noformat}
> We also occasionally get leaked Schemas, and in one case a leaked InputType 
> that seemed completely unrelated to the other leaks (ARROW-17225).
> I'm wondering if these have to do with references in lambdas that get passed 
> by reference? Or perhaps a cache issue? There were some instances in previous 
> leaks where the backtrace to the {{new}} allocator was different between 
> reported leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17252) [R] Intermittent valgrind failure

2022-07-29 Thread Dewey Dunnington (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572957#comment-17572957
 ] 

Dewey Dunnington commented on ARROW-17252:
--

Another run that had some other failures, including the {{InputType}} one:

https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30290&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=25107


{noformat}
==5248== 56 bytes in 1 blocks are possibly lost in loss record 171 of 3,993
==5248==at 0x4849013: operator new(unsigned long) (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5248==by 0x10547EE7: allocate (new_allocator.h:121)
==5248==by 0x10547EE7: allocate (alloc_traits.h:460)
==5248==at 0x4849013: operator new(unsigned long) (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5248==by 0x101AFFBA: allocate (new_allocator.h:121)
==5248==by 0x101AFFBA: allocate (alloc_traits.h:460)
==5248==by 0x101AFFBA: _M_allocate (stl_vector.h:346)
==5248==by 0x101AFFBA: void std::vector 
>::_M_realloc_insert(__gnu_cxx::__normal_iterator > >, arrow::compute::ExecNode*&&) 
(vector.tcc:440)
==5248==by 0x101AABBA: emplace_back 
(vector.tcc:121)
==5248==by 0x101AABBA: push_back (stl_vector.h:1204)
==5248==by 0x101AABBA: 
arrow::compute::ExecNode::ExecNode(arrow::compute::ExecPlan*, 
std::vector >, 
std::vector, 
std::allocator >, std::allocator, std::allocator > > >, 
std::shared_ptr, int) (exec_plan.cc:414)
==5248==by 0x101AAD22: 
arrow::compute::MapNode::MapNode(arrow::compute::ExecPlan*, 
std::vector >, std::shared_ptr, 
bool) (exec_plan.cc:476)
==5248==by 0x101EC290: ProjectNode (project_node.cc:46)
==5248==by 0x101EC290: EmplaceNode >, std::shared_ptr, 
std::vector >, bool const&> (exec_plan.h:60)
==5248==by 0x101EC290: arrow::compute::(anonymous 
namespace)::ProjectNode::Make(arrow::compute::ExecPlan*, 
std::vector >, arrow::compute::ExecNodeOptions 
const&) (project_node.cc:73)
==5248==by 0xFC20D83: 
std::_Function_handler 
(arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&), arrow::Result 
(*)(arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&)>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, 
std::vector >&&, arrow::compute::ExecNodeOptions 
const&) (invoke.h:60)
==5248==by 0xFA838DC: 
std::function 
(arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&)>::operator()(arrow::compute::ExecPlan*, 
std::vector >, arrow::compute::ExecNodeOptions 
const&) const (std_function.h:622)
==5248==by 0xFA81047: 
arrow::compute::MakeExecNode(std::__cxx11::basic_string, std::allocator > const&, 
arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&, arrow::compute::ExecFactoryRegistry*) (exec_plan.h:438)
==5248==by 0xFA77BE8: MakeExecNodeOrStop(std::__cxx11::basic_string, std::allocator > const&, 
arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&) (compute-exec.cpp:53)
==5248==by 0xFA7ADF2: 
ExecNode_Project(std::shared_ptr const&, 
std::vector, 
std::allocator > > const&, 
std::vector, 
std::allocator >, std::allocator, std::allocator > > >) (compute-exec.cpp:307)
==5248==by 0xF9FC997: _arrow_ExecNode_Project (arrowExports.cpp:986)
==5248==by 0x4953BC4: R_doDotCall (dotcode.c:607)
{noformat}


> [R] Intermittent valgrind failure
> -
>
> Key: ARROW-17252
> URL: https://issues.apache.org/jira/browse/ARROW-17252
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dewey Dunnington
>Priority: Major
>
> A number of recent nightly builds have intermittent failures with valgrind, 
> which fails because of possibly leaked memory around an exec plan. This seems 
> related to a change in XXX that separated {{ExecPlan_prepare()}} from 
> {{ExecPlan_run()}} and added a {{ExecPlan_read_table()}} that uses 
> {{RunWithCapturedR()}}. The reported leaks vary but include ExecPlans and 
> ExecNodes and fields of those objects.
> A failed run: 
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980
> Some example output:
> {noformat}
> ==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are 
> definitely lost in loss record 1,988 of 3,883
> ==5249==at 0x4849013: operator new(unsigned long) (in 
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==by 0x10B2902B: 
> std::_Function_handler 
> (arrow::compute::ExecPlan*, std::vector std::allocator >, arrow::compute::ExecNodeOptions 
> const&), 
> arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryReg

[jira] [Created] (ARROW-17252) [R] Intermittent valgrind failure

2022-07-29 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-17252:


 Summary: [R] Intermittent valgrind failure
 Key: ARROW-17252
 URL: https://issues.apache.org/jira/browse/ARROW-17252
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dewey Dunnington


A number of recent nightly builds have intermittent failures with valgrind, 
which fails because of possibly leaked memory around an exec plan. This seems 
related to a change in XXX that separated {{ExecPlan_prepare()}} from 
{{ExecPlan_run()}} and added a {{ExecPlan_read_table()}} that uses 
{{RunWithCapturedR()}}. The reported leaks vary but include ExecPlans and 
ExecNodes and fields of those objects.

A failed run: 
https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980

Some example output:

{noformat}
==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are definitely 
lost in loss record 1,988 of 3,883
==5249==at 0x4849013: operator new(unsigned long) (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5249==by 0x10B2902B: 
std::_Function_handler 
(arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&), 
arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*,
 std::vector >, arrow::compute::ExecNodeOptions 
const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, 
std::vector >&&, arrow::compute::ExecNodeOptions 
const&) (exec_plan.h:60)
==5249==by 0xFA83A0C: 
std::function 
(arrow::compute::ExecPlan*, std::vector >, arrow::compute::ExecNodeOptions 
const&)>::operator()(arrow::compute::ExecPlan*, 
std::vector >, arrow::compute::ExecNodeOptions 
const&) const (std_function.h:622)
==5249== 14,528 (160 direct, 14,368 indirect) bytes in 1 blocks are definitely 
lost in loss record 1,989 of 3,883
==5249==at 0x4849013: operator new(unsigned long) (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5249==by 0x10096CB7: arrow::FutureImpl::Make() (future.cc:187)
==5249==by 0xFCB6F9A: arrow::Future::Make() 
(future.h:420)
==5249==by 0x101AE927: ExecPlanImpl (exec_plan.cc:50)
==5249==by 0x101AE927: 
arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, 
std::shared_ptr) (exec_plan.cc:355)
==5249==by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
==5249==by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
==5249==by 0x4953B60: R_doDotCall (dotcode.c:601)
==5249==by 0x49C2C16: bcEval (eval.c:7682)
==5249==by 0x499DB95: Rf_eval (eval.c:748)
==5249==by 0x49A0904: R_execClosure (eval.c:1918)
==5249==by 0x49A05B7: Rf_applyClosure (eval.c:1844)
==5249==by 0x49B2122: bcEval (eval.c:7094)
==5249== 
==5249== 36,322 (416 direct, 35,906 indirect) bytes in 1 blocks are definitely 
lost in loss record 2,929 of 3,883
==5249==at 0x4849013: operator new(unsigned long) (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5249==by 0x10214F92: arrow::compute::TaskScheduler::Make() 
(task_util.cc:421)
==5249==by 0x101AEA6C: ExecPlanImpl (exec_plan.cc:50)
==5249==by 0x101AEA6C: 
arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, 
std::shared_ptr) (exec_plan.cc:355)
==5249==by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
==5249==by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
==5249==by 0x4953B60: R_doDotCall (dotcode.c:601)
==5249==by 0x49C2C16: bcEval (eval.c:7682)
==5249==by 0x499DB95: Rf_eval (eval.c:748)
==5249==by 0x49A0904: R_execClosure (eval.c:1918)
==5249==by 0x49A05B7: Rf_applyClosure (eval.c:1844)
==5249==by 0x49B2122: bcEval (eval.c:7094)
==5249==by 0x499DB95: Rf_eval (eval.c:748)
{noformat}

We also occasionally get leaked Schemas, and in one case a leaked InputType 
that seemed completely unrelated to the other leaks (ARROW-17225).

I'm wondering if these have to do with references in lambdas that get passed by 
reference? Or perhaps a cache issue? There were some instances in previous 
leaks where the backtrace to the {{new}} allocator was different between 
reported leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-12590) [C++][R] Update copies of Homebrew files to reflect recent updates

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572937#comment-17572937
 ] 

Jacob Wujciak-Jens edited comment on ARROW-12590 at 7/29/22 11:56 AM:
--

[~jonkeane] Just so I understand correctly: As far as I see we have added 
dependencies to our version of the formula that are missing from the upstream 
version on the other hand the upstream version has update the bottle tag/sha 
which is likely why we are having issues with that now. So these changes should 
clearly be synced in both directions.

There are some other changes that should obviously be excluded (download 
url/version + sha) but also some where I am unsure if we should also sync them 
down:
- a patch step to change the mimalloc version in versions.txt
- an addition to the test step (running the cpp tests?)


was (Author: JIRAUSER287549):
[~jonkeane] Just so I understand correctly: As far as I see we have added 
dependencies to our version of the formula that are missing from the upstream 
version on the other hand the upstream version has update the bottle tag/sha 
which is likely we are having issues with that now. So these changes should 
clearly be synced in both directions.

There are some other changes that should obviously be excluded (download 
url/version + sha) but also some where I am unsure if we should also sync them 
down:
- a patch step to change the mimalloc version in versions.txt
- an addition to the test step (running the cpp tests?)

> [C++][R] Update copies of Homebrew files to reflect recent updates
> --
>
> Key: ARROW-12590
> URL: https://issues.apache.org/jira/browse/ARROW-12590
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, R
>Reporter: Ian Cook
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our copies of the Homebrew formulae at 
> [https://github.com/apache/arrow/tree/master/dev/tasks/homebrew-formulae] 
> have drifted out of sync with what's currently in 
> [https://github.com/Homebrew/homebrew-core/tree/master/Formula] and 
> [https://github.com/autobrew/homebrew-core/blob/master/Formula|https://github.com/autobrew/homebrew-core/blob/master/Formula/].
>  Get them back in sync and consider automating some method of checking that 
> they are in sync, e.g. by failing the {{homebrew-cpp}} and 
>  {{homebrew-r-autobrew}} nightly tests if our copies don't match what's in 
> the Homebrew and autobrew repos (but only if there were changes there that 
> weren't made in our repo, and not the inverse).
> Update the instructions at 
>  
> [https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingHomebrewpackages]
>  as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12590) [C++][R] Update copies of Homebrew files to reflect recent updates

2022-07-29 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572937#comment-17572937
 ] 

Jacob Wujciak-Jens commented on ARROW-12590:


[~jonkeane] Just so I understand correctly: As far as I see we have added 
dependencies to our version of the formula that are missing from the upstream 
version on the other hand the upstream version has update the bottle tag/sha 
which is likely we are having issues with that now. So these changes should 
clearly be synced in both directions.

There are some other changes that should obviously be excluded (download 
url/version + sha) but also some where I am unsure if we should also sync them 
down:
- a patch step to change the mimalloc version in versions.txt
- an addition to the test step (running the cpp tests?)

> [C++][R] Update copies of Homebrew files to reflect recent updates
> --
>
> Key: ARROW-12590
> URL: https://issues.apache.org/jira/browse/ARROW-12590
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, R
>Reporter: Ian Cook
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our copies of the Homebrew formulae at 
> [https://github.com/apache/arrow/tree/master/dev/tasks/homebrew-formulae] 
> have drifted out of sync with what's currently in 
> [https://github.com/Homebrew/homebrew-core/tree/master/Formula] and 
> [https://github.com/autobrew/homebrew-core/blob/master/Formula|https://github.com/autobrew/homebrew-core/blob/master/Formula/].
>  Get them back in sync and consider automating some method of checking that 
> they are in sync, e.g. by failing the {{homebrew-cpp}} and 
>  {{homebrew-r-autobrew}} nightly tests if our copies don't match what's in 
> the Homebrew and autobrew repos (but only if there were changes there that 
> weren't made in our repo, and not the inverse).
> Update the instructions at 
>  
> [https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingHomebrewpackages]
>  as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-14847) [R] Implement bindings for lubridate date/time parsing functions

2022-07-29 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc resolved ARROW-14847.

Resolution: Resolved

> [R] Implement bindings for lubridate date/time parsing functions
> 
>
> Key: ARROW-14847
> URL: https://issues.apache.org/jira/browse/ARROW-14847
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Nicola Crane
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17251) [CI][Conan] Enable Flight

2022-07-29 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17251:


 Summary: [CI][Conan] Enable Flight
 Key: ARROW-17251
 URL: https://issues.apache.org/jira/browse/ARROW-17251
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration, Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17250) [CI][Conan] Enable utf8proc automatically

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17250:
---
Labels: pull-request-available  (was: )

> [CI][Conan] Enable utf8proc automatically
> -
>
> Key: ARROW-17250
> URL: https://issues.apache.org/jira/browse/ARROW-17250
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17250) [CI][Conan] Enable utf8proc automatically

2022-07-29 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17250:


 Summary: [CI][Conan] Enable utf8proc automatically
 Key: ARROW-17250
 URL: https://issues.apache.org/jira/browse/ARROW-17250
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration, Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17249) [CI][Conan] Enable bzip2

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17249:
---
Labels: pull-request-available  (was: )

> [CI][Conan] Enable bzip2
> 
>
> Key: ARROW-17249
> URL: https://issues.apache.org/jira/browse/ARROW-17249
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17249) [CI][Conan] Enable bzip2

2022-07-29 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17249:


 Summary: [CI][Conan] Enable bzip2
 Key: ARROW-17249
 URL: https://issues.apache.org/jira/browse/ARROW-17249
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration, Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17248) [CI][Conan] Enable Zstandard

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17248:
---
Labels: pull-request-available  (was: )

> [CI][Conan] Enable Zstandard
> 
>
> Key: ARROW-17248
> URL: https://issues.apache.org/jira/browse/ARROW-17248
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17248) [CI][Conan] Enable Zstandard

2022-07-29 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17248:


 Summary: [CI][Conan] Enable Zstandard
 Key: ARROW-17248
 URL: https://issues.apache.org/jira/browse/ARROW-17248
 Project: Apache Arrow
  Issue Type: Test
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17248) [CI][Conan] Enable Zstandard

2022-07-29 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-17248:
-
Component/s: Continuous Integration
 Packaging

> [CI][Conan] Enable Zstandard
> 
>
> Key: ARROW-17248
> URL: https://issues.apache.org/jira/browse/ARROW-17248
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-16027) [C++][CI] The job labeled "AMD64 MacOS 10.15 C++" runs on MacOS 11.6.5

2022-07-29 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou closed ARROW-16027.

Resolution: Duplicate

> [C++][CI] The job labeled "AMD64 MacOS 10.15 C++" runs on MacOS 11.6.5
> --
>
> Key: ARROW-16027
> URL: https://issues.apache.org/jira/browse/ARROW-16027
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Weston Pace
>Priority: Major
>
> The workflow is configured {{runs-on: macos-latest}} which is no longer 10.15



--
This message was sent by Atlassian Jira
(v8.20.10#820010)