[jira] [Closed] (ARROW-17036) [C++][Gandiva] Add sign Function

2022-07-18 Thread Sahaj Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahaj Gupta closed ARROW-17036.
---
Resolution: Fixed

> [C++][Gandiva] Add sign Function
> 
>
> Key: ARROW-17036
> URL: https://issues.apache.org/jira/browse/ARROW-17036
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Implementing Sign Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-17035) [C++][Gandiva] Add Ceil Function

2022-07-18 Thread Sahaj Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahaj Gupta closed ARROW-17035.
---
Resolution: Fixed

> [C++][Gandiva] Add Ceil Function
> 
>
> Key: ARROW-17035
> URL: https://issues.apache.org/jira/browse/ARROW-17035
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implementing Ceil Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17118) [Docs][Release] Use direct link for adding a new release to Apache report database

2022-07-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-17118.
--
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13645
[https://github.com/apache/arrow/pull/13645]

> [Docs][Release] Use direct link for adding a new release to Apache report 
> database
> --
>
> Key: ARROW-17118
> URL: https://issues.apache.org/jira/browse/ARROW-17118
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17120) copy_files() does not take paths to specific files

2022-07-18 Thread Carl Boettiger (Jira)
Carl Boettiger created ARROW-17120:
--

 Summary: copy_files() does not take paths to specific files
 Key: ARROW-17120
 URL: https://issues.apache.org/jira/browse/ARROW-17120
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Carl Boettiger


`copy_files()` is a pretty handy function for working between local and remote 
interfaces, particularly for any file type arrow doesn't handle (arvo, ncdf, 
h5, etc etc).  

Unfortunately, it seems to work only from directory-to-directory, at least in 
the direction of copying S3 -> local file system.  e.g. this reprex:


{code:java}
library(arrow)
local_dir <- tempfile()
fs::dir_delete(local_dir)
fs::dir_create(local_dir) # create dir if it doesn't exist
l3 <- SubTreeFileSystem$create(local_dir)
l3$ls() #empty

s3 <- s3_bucket("neon4cast-targets/aquatics", endpoint_override = 
"data.ecoforecast.org", anonymous=TRUE)
s3$ls() #not empty, good

copy_files(s3$path("aquatics-targets.csv.gz"), 
l3$path("aquatics-targets.csv.gz"))
l3$ls() # darn, nothing!

copy_files(s3$path("aquatics-targets.csv.gz"), l3)
l3$ls() # darn, nothing!

copy_files(s3, l3)
l3$ls()  # Finally! only this works
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-17047) [Python][Docs] Document how to get field from StructType

2022-07-18 Thread Alenka Frim (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alenka Frim closed ARROW-17047.
---
Fix Version/s: 9.0.0
   Resolution: Resolved

Issue resolved by pull request 13642

https://github.com/apache/arrow/pull/13642

> [Python][Docs] Document how to get field from StructType
> 
>
> Key: ARROW-17047
> URL: https://issues.apache.org/jira/browse/ARROW-17047
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 8.0.0
>Reporter: Will Jones
>Assignee: Anja Boskovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It's not at all obvious how to get a particular field from a StructType from 
> it's API page:
> https://arrow.apache.org/docs/python/generated/pyarrow.StructType.html#pyarrow.StructType
> We should add an example:
> {code:python}
> struct_type = pa.struct({"x": pa.int32(), "y": pa.string()})
> struct_type[0]
> # pyarrow.Field
> pa.schema(list(struct_type))
> # x: int32
> # y: string
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17047) [Python][Docs] Document how to get field from StructType

2022-07-18 Thread Alenka Frim (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alenka Frim updated ARROW-17047:

Parent: ARROW-17048
Issue Type: Sub-task  (was: Improvement)

> [Python][Docs] Document how to get field from StructType
> 
>
> Key: ARROW-17047
> URL: https://issues.apache.org/jira/browse/ARROW-17047
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 8.0.0
>Reporter: Will Jones
>Assignee: Anja Boskovic
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's not at all obvious how to get a particular field from a StructType from 
> it's API page:
> https://arrow.apache.org/docs/python/generated/pyarrow.StructType.html#pyarrow.StructType
> We should add an example:
> {code:python}
> struct_type = pa.struct({"x": pa.int32(), "y": pa.string()})
> struct_type[0]
> # pyarrow.Field
> pa.schema(list(struct_type))
> # x: int32
> # y: string
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17111) [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due to missing libre2

2022-07-18 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568296#comment-17568296
 ] 

Kouhei Sutou commented on ARROW-17111:
--

It's caused by AlmaLinux 9/CentOS 9 Stream update. I've removed cached Docker 
images on DockerHub and restarted jobs.

> [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due 
> to missing libre2
> --
>
> Key: ARROW-17111
> URL: https://issues.apache.org/jira/browse/ARROW-17111
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Raúl Cumplido
>Assignee: Kouhei Sutou
>Priority: Critical
>  Labels: Nightly
> Fix For: 9.0.0
>
>
> The following nightly packaging jobs have been failing:
> [almalinux-9-amd64|https://github.com/ursacomputing/crossbow/runs/7385779728?check_suite_focus=true]
> [almalinux-9-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/25327#L5812]
> [centos-9-stream-amd64|https://github.com/ursacomputing/crossbow/runs/7385764133?check_suite_focus=true]
> [centos-9-stream-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/253299972#L6029]
> It errors when installing with dnf. It seems due to not finding libre2:
> {code:java}
>  + dnf install -y --enablerepo=crb --enablerepo=epel 
> arrow-devel-9.0.0.dev405-1.el9
> Apache Arrow for AlmaLinux 9 - aarch64          2.6 MB/s |  25 kB     00:00   
>  
> Extra Packages for Enterprise Linux 9 - aarch64  15 MB/s | 8.3 MB     00:00   
>  
> Error: 
>  Problem: package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires 
> libarrow.so.900()(64bit), but none of the providers can be installed
>   - package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires arrow9-libs = 
> 9.0.0.dev405-1.el9, but none of the providers can be installed
>   - conflicting requests
>   - nothing provides libre2.so.0a()(64bit) needed by 
> arrow9-libs-9.0.0.dev405-1.el9.aarch64
> (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to 
> use not only best candidate packages)
> rake aborted!
> Command failed with status (1): [docker run --log-driver none --rm 
> --securi...]{code}
> We have lately upgraded some vendored versions like RE2: 
> [https://github.com/apache/arrow/pull/13570]
> or rapidjson: [https://github.com/apache/arrow/pull/13608]
> The jobs started failing since the rapidjson one was merged. In case is 
> related:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17111) [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due to missing libre2

2022-07-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-17111:


Assignee: Kouhei Sutou

> [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due 
> to missing libre2
> --
>
> Key: ARROW-17111
> URL: https://issues.apache.org/jira/browse/ARROW-17111
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Raúl Cumplido
>Assignee: Kouhei Sutou
>Priority: Critical
>  Labels: Nightly
> Fix For: 9.0.0
>
>
> The following nightly packaging jobs have been failing:
> [almalinux-9-amd64|https://github.com/ursacomputing/crossbow/runs/7385779728?check_suite_focus=true]
> [almalinux-9-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/25327#L5812]
> [centos-9-stream-amd64|https://github.com/ursacomputing/crossbow/runs/7385764133?check_suite_focus=true]
> [centos-9-stream-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/253299972#L6029]
> It errors when installing with dnf. It seems due to not finding libre2:
> {code:java}
>  + dnf install -y --enablerepo=crb --enablerepo=epel 
> arrow-devel-9.0.0.dev405-1.el9
> Apache Arrow for AlmaLinux 9 - aarch64          2.6 MB/s |  25 kB     00:00   
>  
> Extra Packages for Enterprise Linux 9 - aarch64  15 MB/s | 8.3 MB     00:00   
>  
> Error: 
>  Problem: package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires 
> libarrow.so.900()(64bit), but none of the providers can be installed
>   - package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires arrow9-libs = 
> 9.0.0.dev405-1.el9, but none of the providers can be installed
>   - conflicting requests
>   - nothing provides libre2.so.0a()(64bit) needed by 
> arrow9-libs-9.0.0.dev405-1.el9.aarch64
> (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to 
> use not only best candidate packages)
> rake aborted!
> Command failed with status (1): [docker run --log-driver none --rm 
> --securi...]{code}
> We have lately upgraded some vendored versions like RE2: 
> [https://github.com/apache/arrow/pull/13570]
> or rapidjson: [https://github.com/apache/arrow/pull/13608]
> The jobs started failing since the rapidjson one was merged. In case is 
> related:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17119) [C++] Invalid free when run gluten project google test

2022-07-18 Thread Jin Chengcheng (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568289#comment-17568289
 ] 

Jin Chengcheng edited comment on ARROW-17119 at 7/19/22 3:05 AM:
-

I extract the relevant code
{code:java}
 

#define TYPE_FACTORY(NAME, KLASS)      
                     
           \

  const std::shared_ptr& NAME()
{            
                 \

    static std::shared_ptr result = std::make_shared(); \

    return result;                
                     
                \

  }

TYPE_FACTORY(null, NullType)

TYPE_FACTORY(boolean, BooleanType)

TYPE_FACTORY(int8, Int8Type)

TYPE_FACTORY(uint8, UInt8Type)

TYPE_FACTORY(int16, Int16Type)

TYPE_FACTORY(uint16, UInt16Type)

TYPE_FACTORY(int32, Int32Type)

 

 

ARROW_EXPORT const std::shared_ptr& int16();

 

const std::set> AsofJoinNode::kSupportedOnTypes_ = 
{int64()};

 

 private:

  static const std::set> kSupportedOnTypes_;

void InitStaticData() { 
 // Signed int types
g_signed_int_types = {int8(), int16(), int32(), int64()};   {code}


was (Author: JIRAUSER292899):
I extract the relevant code
{code:java}
 

#define TYPE_FACTORY(NAME, KLASS)      
                     
           \

  const std::shared_ptr& NAME()
{            
                 \

    static std::shared_ptr result = std::make_shared(); \

    return result;                
                     
                \

  }

TYPE_FACTORY(null, NullType)

TYPE_FACTORY(boolean, BooleanType)

TYPE_FACTORY(int8, Int8Type)

TYPE_FACTORY(uint8, UInt8Type)

TYPE_FACTORY(int16, Int16Type)

TYPE_FACTORY(uint16, UInt16Type)

TYPE_FACTORY(int32, Int32Type)

 

 

ARROW_EXPORT const std::shared_ptr& int16();

 

const std::set> AsofJoinNode::kSupportedOnTypes_ = 
{int64()};

 

 private:

  static const std::set> kSupportedOnTypes_;

void InitStaticData() {  // Signed int types  g_signed_int_types = {int8(), 
int16(), int32(), int64()};   {code}

> [C++] Invalid free when run gluten project google test
> --
>
> Key: ARROW-17119
> URL: https://issues.apache.org/jira/browse/ARROW-17119
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: ubuntu20
>Reporter: Jin Chengcheng
>Assignee: Li Jin
>Priority: Major
>
> When I run [gluten|[oap-project/gluten 
> (github.com)|https://github.com/oap-project/gluten]] project google test, it 
> will show a error message after all the simple tests passed.
> {code:java}
> gluten/cpp/build/src# ./exec_backend_test
> Running main() from 
> /build/googletest-j5yxiC/googletest-1.10.0/googletest/src/gtest_main.cc
> [==] Running 2 tests from 1 test suite.
> [--] Global test environment set-up.
> [--] 2 tests from TestExecBackend
> [ RUN      ] TestExecBackend.CreateBackend
> Set backend factory.
> [       OK ] TestExecBackend.CreateBackend (0 ms)
> [ RUN      ] TestExecBackend.GetResultIterator
> [       OK ] TestExecBackend.GetResultIterator (0 ms)
> [--] 2 tests from TestExecBackend (0 ms total)[--] Global 
> test environment tear-down
> [==] 2 tests from 1 test suite ran. (0 ms total)
> [  PASSED  ] 2 tests.
> corrupted size vs. prev_size in fastbins
> Aborted (core dumped)
>  {code}
> I use valgrind to detect, here is the details
> {code:java}
> // code placeholder
> ==32256== Invalid read of size 8
> ==32256==    at 0x5E493B7: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
> ==32256==    by 0x5955816: ??? (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
> ==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
> ==32256==    by 0x77E0A5F: exit (exit.c:139)
> ==32256==    by 0x77BE089: (below main) (libc-start.c:342)
> ==32256==  Address 0xd984680 is 16 bytes inside a block of size 48 free'd
> ==32256==    at 0x483CFBF: operator delete(void*) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32256==    by 0x5E493CD: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
> ==32256==    by 0x7FF65B6: ??? (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
> ==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
> ==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
> ==32256==    by 0x77E0A5F: exit (exit.c:139)
> ==32256==    by 0x77BE089: (below main) (libc-start.c:342)
> ==32256==  Block was alloc'd at
> ==32256==    at 0x483BE63: operator new(unsigned long) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-li

[jira] [Comment Edited] (ARROW-17119) [C++] Invalid free when run gluten project google test

2022-07-18 Thread Jin Chengcheng (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568289#comment-17568289
 ] 

Jin Chengcheng edited comment on ARROW-17119 at 7/19/22 3:05 AM:
-

I extract the relevant code
{code:java}
 

#define TYPE_FACTORY(NAME, KLASS)      
                     
           \

  const std::shared_ptr& NAME()
{            
                 \

    static std::shared_ptr result = std::make_shared(); \

    return result;                
                     
                \

  }

TYPE_FACTORY(null, NullType)

TYPE_FACTORY(boolean, BooleanType)

TYPE_FACTORY(int8, Int8Type)

TYPE_FACTORY(uint8, UInt8Type)

TYPE_FACTORY(int16, Int16Type)

TYPE_FACTORY(uint16, UInt16Type)

TYPE_FACTORY(int32, Int32Type)

 

 

ARROW_EXPORT const std::shared_ptr& int16();

 

const std::set> AsofJoinNode::kSupportedOnTypes_ = 
{int64()};

 

 private:

  static const std::set> kSupportedOnTypes_;

void InitStaticData() {  // Signed int types  g_signed_int_types = {int8(), 
int16(), int32(), int64()};   {code}


was (Author: JIRAUSER292899):
I extract the relevant code
{code:java}
 

#define TYPE_FACTORY(NAME, KLASS)      
                     
           \

  const std::shared_ptr& NAME()
{            
                 \

    static std::shared_ptr result = std::make_shared(); \

    return result;                
                     
                \

  }

TYPE_FACTORY(null, NullType)

TYPE_FACTORY(boolean, BooleanType)

TYPE_FACTORY(int8, Int8Type)

TYPE_FACTORY(uint8, UInt8Type)

TYPE_FACTORY(int16, Int16Type)

TYPE_FACTORY(uint16, UInt16Type)

TYPE_FACTORY(int32, Int32Type)

 

 

ARROW_EXPORT const std::shared_ptr& int16();

 

const std::set> AsofJoinNode::kSupportedOnTypes_ = 
{int64()};

 

 private:

  static const std::set> kSupportedOnTypes_;

 {code}

> [C++] Invalid free when run gluten project google test
> --
>
> Key: ARROW-17119
> URL: https://issues.apache.org/jira/browse/ARROW-17119
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: ubuntu20
>Reporter: Jin Chengcheng
>Assignee: Li Jin
>Priority: Major
>
> When I run [gluten|[oap-project/gluten 
> (github.com)|https://github.com/oap-project/gluten]] project google test, it 
> will show a error message after all the simple tests passed.
> {code:java}
> gluten/cpp/build/src# ./exec_backend_test
> Running main() from 
> /build/googletest-j5yxiC/googletest-1.10.0/googletest/src/gtest_main.cc
> [==] Running 2 tests from 1 test suite.
> [--] Global test environment set-up.
> [--] 2 tests from TestExecBackend
> [ RUN      ] TestExecBackend.CreateBackend
> Set backend factory.
> [       OK ] TestExecBackend.CreateBackend (0 ms)
> [ RUN      ] TestExecBackend.GetResultIterator
> [       OK ] TestExecBackend.GetResultIterator (0 ms)
> [--] 2 tests from TestExecBackend (0 ms total)[--] Global 
> test environment tear-down
> [==] 2 tests from 1 test suite ran. (0 ms total)
> [  PASSED  ] 2 tests.
> corrupted size vs. prev_size in fastbins
> Aborted (core dumped)
>  {code}
> I use valgrind to detect, here is the details
> {code:java}
> // code placeholder
> ==32256== Invalid read of size 8
> ==32256==    at 0x5E493B7: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
> ==32256==    by 0x5955816: ??? (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
> ==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
> ==32256==    by 0x77E0A5F: exit (exit.c:139)
> ==32256==    by 0x77BE089: (below main) (libc-start.c:342)
> ==32256==  Address 0xd984680 is 16 bytes inside a block of size 48 free'd
> ==32256==    at 0x483CFBF: operator delete(void*) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32256==    by 0x5E493CD: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
> ==32256==    by 0x7FF65B6: ??? (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
> ==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
> ==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
> ==32256==    by 0x77E0A5F: exit (exit.c:139)
> ==32256==    by 0x77BE089: (below main) (libc-start.c:342)
> ==32256==  Block was alloc'd at
> ==32256==    at 0x483BE63: operator new(unsigned long) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32256==    by 0x5E4E5E9: std::set, 
> std::less >, 
> std::allocator > 
> >::set(std::initial

[jira] [Commented] (ARROW-17119) [C++] Invalid free when run gluten project google test

2022-07-18 Thread Jin Chengcheng (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568289#comment-17568289
 ] 

Jin Chengcheng commented on ARROW-17119:


I extract the relevant code
{code:java}
 

#define TYPE_FACTORY(NAME, KLASS)      
                     
           \

  const std::shared_ptr& NAME()
{            
                 \

    static std::shared_ptr result = std::make_shared(); \

    return result;                
                     
                \

  }

TYPE_FACTORY(null, NullType)

TYPE_FACTORY(boolean, BooleanType)

TYPE_FACTORY(int8, Int8Type)

TYPE_FACTORY(uint8, UInt8Type)

TYPE_FACTORY(int16, Int16Type)

TYPE_FACTORY(uint16, UInt16Type)

TYPE_FACTORY(int32, Int32Type)

 

 

ARROW_EXPORT const std::shared_ptr& int16();

 

const std::set> AsofJoinNode::kSupportedOnTypes_ = 
{int64()};

 

 private:

  static const std::set> kSupportedOnTypes_;

 {code}

> [C++] Invalid free when run gluten project google test
> --
>
> Key: ARROW-17119
> URL: https://issues.apache.org/jira/browse/ARROW-17119
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: ubuntu20
>Reporter: Jin Chengcheng
>Assignee: Li Jin
>Priority: Major
>
> When I run [gluten|[oap-project/gluten 
> (github.com)|https://github.com/oap-project/gluten]] project google test, it 
> will show a error message after all the simple tests passed.
> {code:java}
> gluten/cpp/build/src# ./exec_backend_test
> Running main() from 
> /build/googletest-j5yxiC/googletest-1.10.0/googletest/src/gtest_main.cc
> [==] Running 2 tests from 1 test suite.
> [--] Global test environment set-up.
> [--] 2 tests from TestExecBackend
> [ RUN      ] TestExecBackend.CreateBackend
> Set backend factory.
> [       OK ] TestExecBackend.CreateBackend (0 ms)
> [ RUN      ] TestExecBackend.GetResultIterator
> [       OK ] TestExecBackend.GetResultIterator (0 ms)
> [--] 2 tests from TestExecBackend (0 ms total)[--] Global 
> test environment tear-down
> [==] 2 tests from 1 test suite ran. (0 ms total)
> [  PASSED  ] 2 tests.
> corrupted size vs. prev_size in fastbins
> Aborted (core dumped)
>  {code}
> I use valgrind to detect, here is the details
> {code:java}
> // code placeholder
> ==32256== Invalid read of size 8
> ==32256==    at 0x5E493B7: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
> ==32256==    by 0x5955816: ??? (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
> ==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
> ==32256==    by 0x77E0A5F: exit (exit.c:139)
> ==32256==    by 0x77BE089: (below main) (libc-start.c:342)
> ==32256==  Address 0xd984680 is 16 bytes inside a block of size 48 free'd
> ==32256==    at 0x483CFBF: operator delete(void*) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32256==    by 0x5E493CD: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
> ==32256==    by 0x7FF65B6: ??? (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
> ==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
> ==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
> ==32256==    by 0x77E0A5F: exit (exit.c:139)
> ==32256==    by 0x77BE089: (below main) (libc-start.c:342)
> ==32256==  Block was alloc'd at
> ==32256==    at 0x483BE63: operator new(unsigned long) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32256==    by 0x5E4E5E9: std::set, 
> std::less >, 
> std::allocator > 
> >::set(std::initializer_list >, 
> std::less > const&, 
> std::allocator > const&) (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x7FF4CC4: _GLOBAL__sub_I_asof_join_node.cc (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
> ==32256==    by 0x4011B99: call_init.part.0 (dl-init.c:72)
> ==32256==    by 0x4011CA0: call_init (dl-init.c:30)
> ==32256==    by 0x4011CA0: _dl_init (dl-init.c:119)
> ==32256==    by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
> ==32256==
> ==32256== Invalid free() / delete / delete[] / realloc()
> ==32256==    at 0x483CFBF: operator delete(void*) (in 
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32256==    by 0x5E493CD: std::set, 
> std::less >, 
> std::allocator > >::~set() (in 
> /mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
> ==32256==    by 0x77E0FDD: __cxa_finalize (cxa_final

[jira] [Updated] (ARROW-17096) [C++] Mode kernel incorrect for boolean inputs

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17096:
---
Labels: pull-request-available  (was: )

> [C++] Mode kernel incorrect for boolean inputs
> --
>
> Key: ARROW-17096
> URL: https://issues.apache.org/jira/browse/ARROW-17096
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 8.0.0
>Reporter: Matthew Roeschke
>Assignee: Yibo Cai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> In [1]: import pyarrow.compute as pc
> In [2]: import pyarrow as pa
> In [3]: pa.__version__
> Out[3]: '8.0.0'
> In [4]: pc.mode(pa.array([True, True]))
> # Correct
> Out[4]:
> 
> -- is_valid: all not null
> -- child 0 type: bool
>   [
>     true
>   ]
> -- child 1 type: int64
>   [
>     2
>   ]
> # Incorrect
> In [5]: pc.mode(pa.array([True, False]), 2)
> Out[5]:
> 
> -- is_valid: all not null
> -- child 0 type: bool
>   [
>     false, # should be true
>     false
>   ]
> -- child 1 type: int64
>   [
>     1,
>     1
>   ] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17119) [C++] Invalid free when run gluten project google test

2022-07-18 Thread Jin Chengcheng (Jira)
Jin Chengcheng created ARROW-17119:
--

 Summary: [C++] Invalid free when run gluten project google test
 Key: ARROW-17119
 URL: https://issues.apache.org/jira/browse/ARROW-17119
 Project: Apache Arrow
  Issue Type: Bug
 Environment: ubuntu20
Reporter: Jin Chengcheng
Assignee: Li Jin


When I run [gluten|[oap-project/gluten 
(github.com)|https://github.com/oap-project/gluten]] project google test, it 
will show a error message after all the simple tests passed.
{code:java}

gluten/cpp/build/src# ./exec_backend_test
Running main() from 
/build/googletest-j5yxiC/googletest-1.10.0/googletest/src/gtest_main.cc
[==] Running 2 tests from 1 test suite.
[--] Global test environment set-up.
[--] 2 tests from TestExecBackend
[ RUN      ] TestExecBackend.CreateBackend
Set backend factory.
[       OK ] TestExecBackend.CreateBackend (0 ms)
[ RUN      ] TestExecBackend.GetResultIterator
[       OK ] TestExecBackend.GetResultIterator (0 ms)
[--] 2 tests from TestExecBackend (0 ms total)[--] Global test 
environment tear-down
[==] 2 tests from 1 test suite ran. (0 ms total)
[  PASSED  ] 2 tests.
corrupted size vs. prev_size in fastbins
Aborted (core dumped)
 {code}
I use valgrind to detect, here is the details
{code:java}
// code placeholder
==32256== Invalid read of size 8
==32256==    at 0x5E493B7: std::set, 
std::less >, 
std::allocator > >::~set() (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
==32256==    by 0x5955816: ??? (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
==32256==    by 0x77E0A5F: exit (exit.c:139)
==32256==    by 0x77BE089: (below main) (libc-start.c:342)
==32256==  Address 0xd984680 is 16 bytes inside a block of size 48 free'd
==32256==    at 0x483CFBF: operator delete(void*) (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==32256==    by 0x5E493CD: std::set, 
std::less >, 
std::allocator > >::~set() (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
==32256==    by 0x7FF65B6: ??? (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
==32256==    by 0x77E0A5F: exit (exit.c:139)
==32256==    by 0x77BE089: (below main) (libc-start.c:342)
==32256==  Block was alloc'd at
==32256==    at 0x483BE63: operator new(unsigned long) (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==32256==    by 0x5E4E5E9: std::set, 
std::less >, 
std::allocator > 
>::set(std::initializer_list >, 
std::less > const&, 
std::allocator > const&) (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x7FF4CC4: _GLOBAL__sub_I_asof_join_node.cc (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
==32256==    by 0x4011B99: call_init.part.0 (dl-init.c:72)
==32256==    by 0x4011CA0: call_init (dl-init.c:30)
==32256==    by 0x4011CA0: _dl_init (dl-init.c:119)
==32256==    by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==32256==
==32256== Invalid free() / delete / delete[] / realloc()
==32256==    at 0x483CFBF: operator delete(void*) (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==32256==    by 0x5E493CD: std::set, 
std::less >, 
std::allocator > >::~set() (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
==32256==    by 0x5955816: ??? (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
==32256==    by 0x77E0A5F: exit (exit.c:139)
==32256==    by 0x77BE089: (below main) (libc-start.c:342)
==32256==  Address 0xd984670 is 0 bytes inside a block of size 48 free'd
==32256==    at 0x483CFBF: operator delete(void*) (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==32256==    by 0x5E493CD: std::set, 
std::less >, 
std::allocator > >::~set() (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow.so.900.0.0)
==32256==    by 0x77E0FDD: __cxa_finalize (cxa_finalize.c:83)
==32256==    by 0x7FF65B6: ??? (in 
/mnt/jcc/code/gluten/cpp/build/releases/libarrow_dataset_jni.so.900.0.0)
==32256==    by 0x4011F6A: _dl_fini (dl-fini.c:138)
==32256==    by 0x77E08A6: __run_exit_handlers (exit.c:108)
==32256==    by 0x77E0A5F: exit (exit.c:139)
==32256==    by 0x77BE089: (below main) (libc-start.c:342)
==32256==  Block was alloc'd at
==32256==    at 0x483BE63: operator new(unsigned long) (in 
/usr

[jira] [Resolved] (ARROW-17101) [Java] Prepare new protoc-gen-grpc-java for s390x

2022-07-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-17101.
--
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13632
[https://github.com/apache/arrow/pull/13632]

> [Java] Prepare new protoc-gen-grpc-java for s390x
> -
>
> Key: ARROW-17101
> URL: https://issues.apache.org/jira/browse/ARROW-17101
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Recent s390x build for Java causes 
> [failures|https://app.travis-ci.com/github/apache/arrow/jobs/576822591#L2933] 
> due to missing relevant version of protoc-gen-grpc-java.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17101) [Java] Prepare new protoc-gen-grpc-java for s390x

2022-07-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-17101:


Assignee: Kazuaki Ishizaki

> [Java] Prepare new protoc-gen-grpc-java for s390x
> -
>
> Key: ARROW-17101
> URL: https://issues.apache.org/jira/browse/ARROW-17101
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Recent s390x build for Java causes 
> [failures|https://app.travis-ci.com/github/apache/arrow/jobs/576822591#L2933] 
> due to missing relevant version of protoc-gen-grpc-java.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17112) [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on s390x

2022-07-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-17112.
--
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13638
[https://github.com/apache/arrow/pull/13638]

> [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on 
> s390x
> -
>
> Key: ARROW-17112
> URL: https://issues.apache.org/jira/browse/ARROW-17112
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On big-endian plantform such as s390x, 
> {{TestArrowReaderWriter.testFileFooterSizeOverflow}} causes a failure.
> {code}
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestArrowReaderWriter.testFileFooterSizeOverflow:913 
> expected:<...alid footer length: [2147483647]> but was:<...alid footer 
> length: [-129]>
> [INFO] 
> [ERROR] Tests run: 610, Failures: 1, Errors: 0, Skipped: 4
> [INFO] 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Arrow Java Root POM 9.0.0-SNAPSHOT .. SUCCESS [  2.182 
> s]
> [INFO] Arrow Format ... SUCCESS [  0.995 
> s]
> [INFO] Arrow Memory ... SUCCESS [  0.761 
> s]
> [INFO] Arrow Memory - Core  SUCCESS [  1.582 
> s]
> [INFO] Arrow Memory - Unsafe .. SUCCESS [  1.600 
> s]
> [INFO] Arrow Memory - Netty ... SUCCESS [  1.966 
> s]
> [INFO] Arrow Vectors .. FAILURE [ 16.779 
> s]
> [INFO] Arrow Compression .. SKIPPED
> [INFO] Arrow Tools  SKIPPED
> [INFO] Arrow JDBC Adapter . SKIPPED
> [INFO] Arrow Plasma Client  SUCCESS [  1.171 
> s]
> [INFO] Arrow Flight ... SUCCESS [  0.741 
> s]
> [INFO] Arrow Flight Core .. SKIPPED
> [INFO] Arrow Flight GRPC .. SKIPPED
> [INFO] Arrow Flight SQL ... SKIPPED
> [INFO] Arrow Flight Integration Tests . SKIPPED
> [INFO] Arrow AVRO Adapter . SKIPPED
> [INFO] Arrow Algorithms ... SKIPPED
> [INFO] Arrow Performance Benchmarks 9.0.0-SNAPSHOT  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 23.584 s (Wall Clock)
> [INFO] Finished at: 2022-07-18T12:32:21Z
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) 
> on project arrow-vector: There are test failures.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17112) [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on s390x

2022-07-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-17112:


Assignee: Kazuaki Ishizaki

> [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on 
> s390x
> -
>
> Key: ARROW-17112
> URL: https://issues.apache.org/jira/browse/ARROW-17112
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On big-endian plantform such as s390x, 
> {{TestArrowReaderWriter.testFileFooterSizeOverflow}} causes a failure.
> {code}
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestArrowReaderWriter.testFileFooterSizeOverflow:913 
> expected:<...alid footer length: [2147483647]> but was:<...alid footer 
> length: [-129]>
> [INFO] 
> [ERROR] Tests run: 610, Failures: 1, Errors: 0, Skipped: 4
> [INFO] 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Arrow Java Root POM 9.0.0-SNAPSHOT .. SUCCESS [  2.182 
> s]
> [INFO] Arrow Format ... SUCCESS [  0.995 
> s]
> [INFO] Arrow Memory ... SUCCESS [  0.761 
> s]
> [INFO] Arrow Memory - Core  SUCCESS [  1.582 
> s]
> [INFO] Arrow Memory - Unsafe .. SUCCESS [  1.600 
> s]
> [INFO] Arrow Memory - Netty ... SUCCESS [  1.966 
> s]
> [INFO] Arrow Vectors .. FAILURE [ 16.779 
> s]
> [INFO] Arrow Compression .. SKIPPED
> [INFO] Arrow Tools  SKIPPED
> [INFO] Arrow JDBC Adapter . SKIPPED
> [INFO] Arrow Plasma Client  SUCCESS [  1.171 
> s]
> [INFO] Arrow Flight ... SUCCESS [  0.741 
> s]
> [INFO] Arrow Flight Core .. SKIPPED
> [INFO] Arrow Flight GRPC .. SKIPPED
> [INFO] Arrow Flight SQL ... SKIPPED
> [INFO] Arrow Flight Integration Tests . SKIPPED
> [INFO] Arrow AVRO Adapter . SKIPPED
> [INFO] Arrow Algorithms ... SKIPPED
> [INFO] Arrow Performance Benchmarks 9.0.0-SNAPSHOT  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 23.584 s (Wall Clock)
> [INFO] Finished at: 2022-07-18T12:32:21Z
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) 
> on project arrow-vector: There are test failures.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17118) [Docs][Release] Use direct link for adding a new release to Apache report database

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17118:
---
Labels: pull-request-available  (was: )

> [Docs][Release] Use direct link for adding a new release to Apache report 
> database
> --
>
> Key: ARROW-17118
> URL: https://issues.apache.org/jira/browse/ARROW-17118
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17118) [Docs][Release] Use direct link for adding a new release to Apache report database

2022-07-18 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17118:


 Summary: [Docs][Release] Use direct link for adding a new release 
to Apache report database
 Key: ARROW-17118
 URL: https://issues.apache.org/jira/browse/ARROW-17118
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17117) [C++] Add timestamp column datatype support for AsOfJoin

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17117:
---
Labels: pull-request-available  (was: )

> [C++] Add timestamp column datatype support for AsOfJoin
> 
>
> Key: ARROW-17117
> URL: https://issues.apache.org/jira/browse/ARROW-17117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ivan Chau
>Assignee: Ivan Chau
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17117) [C++] Add timestamp column datatype support for AsOfJoin

2022-07-18 Thread Ivan Chau (Jira)
Ivan Chau created ARROW-17117:
-

 Summary: [C++] Add timestamp column datatype support for AsOfJoin
 Key: ARROW-17117
 URL: https://issues.apache.org/jira/browse/ARROW-17117
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ivan Chau
Assignee: Ivan Chau






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17116) Adding RepeatStr

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568253#comment-17568253
 ] 

David Li commented on ARROW-17116:
--

Is this for Gandiva, presumably?

> Adding RepeatStr
> 
>
> Key: ARROW-17116
> URL: https://issues.apache.org/jira/browse/ARROW-17116
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Priority: Minor
>
> Adding RepeatStr Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568251#comment-17568251
 ] 

Kouhei Sutou commented on ARROW-17110:
--

Can we avoid depending on Abseil ABI by removing Abseil use in 
{{cpp/src/arrow/filesystem/gcsfs_internal.cc}} and 
{{cpp/src/arrow/filesystem/gcsfs_test.cc}}?

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17116) Adding RepeatStr

2022-07-18 Thread Sahaj Gupta (Jira)
Sahaj Gupta created ARROW-17116:
---

 Summary: Adding RepeatStr
 Key: ARROW-17116
 URL: https://issues.apache.org/jira/browse/ARROW-17116
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Sahaj Gupta


Adding RepeatStr Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17113) [Java] All static initializers should catch and report exceptions

2022-07-18 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-17113:
-
Labels: good-first-issue good-second-issue  (was: )

> [Java] All static initializers should catch and report exceptions
> -
>
> Key: ARROW-17113
> URL: https://issues.apache.org/jira/browse/ARROW-17113
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: David Li
>Priority: Major
>  Labels: good-first-issue, good-second-issue
>
> As reported on the mailing list: 
> https://lists.apache.org/thread/gysn25gsm4v1fvvx9l0sjyr627xy7q65
> All static initializers should catch and report exceptions, or else they will 
> get swallowed by the JVM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17115) [C++] HashJoin fails if it encounters a batch with more than 32Ki rows

2022-07-18 Thread Weston Pace (Jira)
Weston Pace created ARROW-17115:
---

 Summary: [C++] HashJoin fails if it encounters a batch with more 
than 32Ki rows
 Key: ARROW-17115
 URL: https://issues.apache.org/jira/browse/ARROW-17115
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Weston Pace
Assignee: Weston Pace


The new swiss join assumes that batches are being broken according to the 
morsel/batch model and it assumes those batches have, at most, 32Ki rows 
(signed 16-bit indices are used in various places).

However, we are not currently slicing all of our inputs to batches this small.  
This is causing conbench to fail and would likely be a problem with any large 
inputs.

We should fix this by slicing batches in the engine to the appropriate maximum 
size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17047) [Python][Docs] Document how to get field from StructType

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17047:
---
Labels: pull-request-available  (was: )

> [Python][Docs] Document how to get field from StructType
> 
>
> Key: ARROW-17047
> URL: https://issues.apache.org/jira/browse/ARROW-17047
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 8.0.0
>Reporter: Will Jones
>Assignee: Anja Boskovic
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's not at all obvious how to get a particular field from a StructType from 
> it's API page:
> https://arrow.apache.org/docs/python/generated/pyarrow.StructType.html#pyarrow.StructType
> We should add an example:
> {code:python}
> struct_type = pa.struct({"x": pa.int32(), "y": pa.string()})
> struct_type[0]
> # pyarrow.Field
> pa.schema(list(struct_type))
> # x: int32
> # y: string
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12693) [R] add unique() methods for ArrowTabular, datasets

2022-07-18 Thread Sam Albers (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568225#comment-17568225
 ] 

Sam Albers commented on ARROW-12693:


> Well, the original issue reporter seemed to expect that it would work ;)

So foolish he was. 

> feature request to support $ on query objects

I wasn't able to find a specific ticket for that. I can create one though as I 
think it might something worth exploring.

> [R] add unique() methods for ArrowTabular, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
> #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
> #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
> #> tidyselect 1.1.1 2021

[jira] [Updated] (ARROW-12693) [R] add unique() methods for ArrowTabular, datasets

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12693:
---
Labels: pull-request-available  (was: )

> [R] add unique() methods for ArrowTabular, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
> #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
> #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
> #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
> #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
> #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
> #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4)
> #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
> #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
> #> 
> #> [1] C:/

[jira] [Updated] (ARROW-12693) [R] add unique() methods for ArrowTabular, datasets

2022-07-18 Thread Sam Albers (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Albers updated ARROW-12693:
---
Summary: [R] add unique() methods for ArrowTabular, datasets  (was: [R] add 
unique() methods for ArrowTabluar, datasets)

> [R] add unique() methods for ArrowTabular, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
> #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
> #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
> #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
> #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
> #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
> #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4)
> #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
> #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
> #> 
> #> [1] C:/Users/salbers/R/win-libra

[jira] [Assigned] (ARROW-17114) [Python][C++] O_DIRECT write support

2022-07-18 Thread Ziheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziheng Wang reassigned ARROW-17114:
---

Assignee: Ziheng Wang

> [Python][C++] O_DIRECT write support 
> -
>
> Key: ARROW-17114
> URL: https://issues.apache.org/jira/browse/ARROW-17114
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Ziheng Wang
>Assignee: Ziheng Wang
>Priority: Minor
>
> Add support for O_DIRECT on Ubuntu in OutputStream writes to save page cache 
> RAM for large writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17114) [Python][C++] O_DIRECT write support

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17114:
---
Labels: pull-request-available  (was: )

> [Python][C++] O_DIRECT write support 
> -
>
> Key: ARROW-17114
> URL: https://issues.apache.org/jira/browse/ARROW-17114
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Ziheng Wang
>Assignee: Ziheng Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add support for O_DIRECT on Ubuntu in OutputStream writes to save page cache 
> RAM for large writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17114) [Python][C++] O_DIRECT write support

2022-07-18 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17114:
---

 Summary: [Python][C++] O_DIRECT write support 
 Key: ARROW-17114
 URL: https://issues.apache.org/jira/browse/ARROW-17114
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Ziheng Wang


Add support for O_DIRECT on Ubuntu in OutputStream writes to save page cache 
RAM for large writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17113) [Java] All static initializers should catch and report exceptions

2022-07-18 Thread David Li (Jira)
David Li created ARROW-17113:


 Summary: [Java] All static initializers should catch and report 
exceptions
 Key: ARROW-17113
 URL: https://issues.apache.org/jira/browse/ARROW-17113
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


As reported on the mailing list: 
https://lists.apache.org/thread/gysn25gsm4v1fvvx9l0sjyr627xy7q65

All static initializers should catch and report exceptions, or else they will 
get swallowed by the JVM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16802) [Docs] Improve Acero Documentation

2022-07-18 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568213#comment-17568213
 ] 

Weston Pace commented on ARROW-16802:
-

[~kexin] in addition to the mailing list there is a Zulip instance at 
https://ursalabs.zulipchat.com .  It has ursalabs in the name (I think we are 
trying to fix this) but it is open to all.

> [Docs] Improve Acero Documentation
> --
>
> Key: ARROW-16802
> URL: https://issues.apache.org/jira/browse/ARROW-16802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Will Jones
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> From [~amol-] :
> {quote}If we want to start promoting Acero to the world, I think we should 
> work on improving a bit the documentation first. Having a blog post that then 
> redirects people to a docs that they find hard to read/apply might actually 
> be counterproductive as it might create a fame of being badly documented.
> At the moment the only mention of it is 
> [https://arrow.apache.org/docs/cpp/streaming_execution.html] and it's not 
> very easy to follow (not much explainations, just blocks of code). In 
> comparison if you look at the compute chapter in Python ( 
> [https://arrow.apache.org/docs/dev/python/compute.html] ) it's much more 
> talkative and explains things as it goes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16964) [C++] TSAN error in asof-join-node tests

2022-07-18 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace reassigned ARROW-16964:
---

Assignee: Weston Pace

> [C++] TSAN error in asof-join-node tests
> 
>
> Key: ARROW-16964
> URL: https://issues.apache.org/jira/browse/ARROW-16964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/ursacomputing/crossbow/runs/7141923270?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16964) [C++] TSAN error in asof-join-node tests

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16964:
---
Labels: pull-request-available  (was: )

> [C++] TSAN error in asof-join-node tests
> 
>
> Key: ARROW-16964
> URL: https://issues.apache.org/jira/browse/ARROW-16964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/ursacomputing/crossbow/runs/7141923270?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-8324) [R] Add read/write_ipc_file separate from _feather

2022-07-18 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8324.

Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13626
[https://github.com/apache/arrow/pull/13626]

> [R] Add read/write_ipc_file separate from _feather
> --
>
> Key: ARROW-8324
> URL: https://issues.apache.org/jira/browse/ARROW-8324
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: SHIMA Tatsuya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See [https://github.com/apache/arrow/pull/6771#issuecomment-608133760]
> {quote}Let's add read/write_ipc_file also? I'm wary of the "version" option 
> in "write_feather" and the Feather version inference capability in 
> "read_feather". It's potentially confusing and we may choose to add options 
> to write_ipc_file/read_ipc_file that are more developer centric, having to do 
> with particulars in the IPC format, that are not relevant or appropriate for 
> the Feather APIs.
> IMHO it's best for "Feather format" to remain an abstracted higher-level 
> concept with its use of the "IPC file format" as an implementation detail, 
> and segregated from the other things.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568183#comment-17568183
 ] 

David Li commented on ARROW-17110:
--

The R Windows builds use a distribution of MinGW, not Centos: 
https://cran.r-project.org/bin/windows/Rtools/history.html

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568182#comment-17568182
 ] 

H. Vetinari commented on ARROW-17110:
-

> The devtoolset backport won't do anything for the gcc 4.9 requirement on R 
> Windows builds, I'm afraid.

What shape does this requirement take? The defining feature of the devtoolset 
backports is that they're fully ABI-compatible with the default compiler (i.e. 
4.8), and I doubt R hard-depends on the presence of specific bugs in GCC 4.x 
that were fixed in later versions.

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-8163) [C++][Dataset] Allow FileSystemDataset's file list to be lazy

2022-07-18 Thread Pavel Solodovnikov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568181#comment-17568181
 ] 

Pavel Solodovnikov commented on ARROW-8163:
---

[~westonpace] Thanks!

> [C++][Dataset] Allow FileSystemDataset's file list to be lazy
> -
>
> Key: ARROW-8163
> URL: https://issues.apache.org/jira/browse/ARROW-8163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Pavel Solodovnikov
>Priority: Major
>  Labels: dataset
>
> A FileSystemDataset currently requires a full listing of files it contains on 
> construction, so a scan cannot start until all files in the dataset are 
> discovered. Instead it would be ideal if a large dataset could be constructed 
> with a lazy file listing so that scans can start immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17102) [R] Test fails on R minimal nightly builds due to Parquet writing

2022-07-18 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-17102.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13631
[https://github.com/apache/arrow/pull/13631]

> [R] Test fails on R minimal nightly builds due to Parquet writing
> -
>
> Key: ARROW-17102
> URL: https://issues.apache.org/jira/browse/ARROW-17102
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Assignee: Nicola Crane
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> May be due to missing option to skip if parquet not available
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=29590&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=17703



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12693) [R] add unique() methods for ArrowTabluar, datasets

2022-07-18 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568169#comment-17568169
 ] 

Neal Richardson commented on ARROW-12693:
-

> Because this isn't a dplyr function, do you think this would automatically 
> pull the vector into memory?

Good call, probably so.

> I also wonder if it even worth implementing this.

Well, the original issue reporter seemed to expect that it would work ;)

> I think there would be some expectation to support something like 
> {{unique(some_arrow_table$variable)}}

IIRC there is a feature request to support $ on query objects

> one could still call {{unique()}} on a whole table but that would duplicate 
> what {{distinct()}} does.

If it's not worth implementing, we could add {{unique.arrow_dplyr_query}} et 
al. that just raises an error telling you to call distinct() instead. But at 
that point, we might as well just wire it up to do distinct %>% collect, right?

> [R] add unique() methods for ArrowTabluar, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.

[jira] [Commented] (ARROW-12693) [R] add unique() methods for ArrowTabluar, datasets

2022-07-18 Thread Sam Albers (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568162#comment-17568162
 ] 

Sam Albers commented on ARROW-12693:


I also wonder if it even worth implementing this. None (afaik) of the dbplyr 
backends implement anything for {{{}unique(){}}}. {{distinct()}} provides the 
same functionality. {{unique()}} falls a bit outside of the dplyr paradigm and 
I think there would be some expectation to support something like 
{{unique(some_arrow_table$variable)}} type syntax. If that syntax wasn't 
supported then one could still call {{unique()}} on a whole table but that 
would duplicate what {{distinct()}} does.

> [R] add unique() methods for ArrowTabluar, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4

[jira] [Resolved] (ARROW-17071) [C++][CI] arrow-compute-plan-test and arrow-compute-hash-join-node-test fails on test-conda-cpp-valgrind

2022-07-18 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace resolved ARROW-17071.
-
Resolution: Fixed

Issue resolved by pull request 13616
[https://github.com/apache/arrow/pull/13616]

> [C++][CI] arrow-compute-plan-test and arrow-compute-hash-join-node-test fails 
> on test-conda-cpp-valgrind
> 
>
> Key: ARROW-17071
> URL: https://issues.apache.org/jira/browse/ARROW-17071
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Raúl Cumplido
>Assignee: Michal Nowakiewicz
>Priority: Critical
>  Labels: Nightly, pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There has been an issue on the last test-conda-cpp-valgrind nightly tests for 
> both:
> {code:java}
> The following tests FAILED:
>34 - arrow-compute-plan-test (Failed)
>35 - arrow-compute-hash-join-node-test (Failed) {code}
> There seems to be a couple of issues. The job run can be seen here:
> [https://github.com/ursacomputing/crossbow/runs/7332161636]
> Taking a look on the last introduced commits, I think the commit that 
> introduced the issue is the following one: 
> [https://github.com/apache/arrow/commit/96a3af437bfc498b75b832b161df378ad96cae1c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16802) [Docs] Improve Acero Documentation

2022-07-18 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568157#comment-17568157
 ] 

Ian Cook commented on ARROW-16802:
--

One easy way to help make Acero docs more visible and accessible is by adding 
an *Acero* link in the *Subprojects* dropdown menu on the Arrow website.

> [Docs] Improve Acero Documentation
> --
>
> Key: ARROW-16802
> URL: https://issues.apache.org/jira/browse/ARROW-16802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Will Jones
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> From [~amol-] :
> {quote}If we want to start promoting Acero to the world, I think we should 
> work on improving a bit the documentation first. Having a blog post that then 
> redirects people to a docs that they find hard to read/apply might actually 
> be counterproductive as it might create a fame of being badly documented.
> At the moment the only mention of it is 
> [https://arrow.apache.org/docs/cpp/streaming_execution.html] and it's not 
> very easy to follow (not much explainations, just blocks of code). In 
> comparison if you look at the compute chapter in Python ( 
> [https://arrow.apache.org/docs/dev/python/compute.html] ) it's much more 
> talkative and explains things as it goes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16802) [Docs] Improve Acero Documentation

2022-07-18 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568156#comment-17568156
 ] 

Ian Cook commented on ARROW-16802:
--

One important consideration: In the future, we intend for Substrait to be the 
primary "API language" for Acero. We will discourage direct use of the ExecPlan 
API and encourage developers to use Substrait plans to tell Acero what 
operations to execute. So we should probably not invest too much energy in 
documenting the ExecPlan API.

> [Docs] Improve Acero Documentation
> --
>
> Key: ARROW-16802
> URL: https://issues.apache.org/jira/browse/ARROW-16802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Will Jones
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> From [~amol-] :
> {quote}If we want to start promoting Acero to the world, I think we should 
> work on improving a bit the documentation first. Having a blog post that then 
> redirects people to a docs that they find hard to read/apply might actually 
> be counterproductive as it might create a fame of being badly documented.
> At the moment the only mention of it is 
> [https://arrow.apache.org/docs/cpp/streaming_execution.html] and it's not 
> very easy to follow (not much explainations, just blocks of code). In 
> comparison if you look at the compute chapter in Python ( 
> [https://arrow.apache.org/docs/dev/python/compute.html] ) it's much more 
> talkative and explains things as it goes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16000) [C++][Dataset] Support Latin-1 encoding

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568154#comment-17568154
 ] 

David Li commented on ARROW-16000:
--

The comparison to compression is a good one. I agree it seems reasonable to put 
it there too.

> [C++][Dataset] Support Latin-1 encoding
> ---
>
> Key: ARROW-16000
> URL: https://issues.apache.org/jira/browse/ARROW-16000
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Nicola Crane
>Priority: Major
>
> In ARROW-15992 a user is reporting issues with trying to read in files with 
> Latin-1 encoding.  I had a look through the docs for the Dataset API and I 
> don't think this is currently supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15838) [C++] Key column behavior in joins

2022-07-18 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568150#comment-17568150
 ] 

Weston Pace commented on ARROW-15838:
-

It appears that pyarrow is actually doing the coalesce (which is the correct 
thing) while R isn't (hence ARROW-16897).  I agree that it would be nice to 
integrate this into the join itself.  [~zagto] had a PR to do this but it was 
built on top of the newer swiss join which, at the time, was not merged, so it 
was put on hold.  It's probably too late to try and pull that in for 9.0.0 but 
I think this is something we can tackle as part of 10.0.0

> [C++] Key column behavior in joins
> --
>
> Key: ARROW-15838
> URL: https://issues.apache.org/jira/browse/ARROW-15838
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Jonathan Keane
>Priority: Major
> Fix For: 10.0.0
>
>
> By default in dplyr (and possibly in pandas too?) coalesces the key column 
> for full joins to be the (non-null) values from both key columns:
> {code}
> > left <- tibble::tibble(
>   key = c(1, 2),
>   A = c(0, 1),  
> )  
> left_tab <- Table$create(left)
> > right <- tibble::tibble(
>   key = c(2, 3),
>   B = c(0, 1),
> )  
> right_tab <- Table$create(right)
> > left %>% full_join(right) 
> Joining, by = "key"
> # A tibble: 3 × 3
> key A B
> 
> 1 1 0NA
> 2 2 1 0
> 3 3NA 1
> > left_tab %>% full_join(right_tab) %>% collect()
> # A tibble: 3 × 3
> key A B
> 
> 1 2 1 0
> 2 1 0NA
> 3NANA 1
> {code}
> And right join, we would expect the key from the right table to be in the 
> result, but we get the key from the left instead:
> {code}
> > left <- tibble::tibble(
>   key = c(1, 2),
>   A = c(0, 1),  
> )  
> left_tab <- Table$create(left)
> > right <- tibble::tibble(
>   key = c(2, 3),
>   B = c(0, 1),
> )  
> right_tab <- Table$create(right)
> > left %>% right_join(right)
> Joining, by = "key"
> # A tibble: 2 × 3
> key A B
> 
> 1 2 1 0
> 2 3NA 1
> > left_tab %>% right_join(right_tab) %>% collect()
> # A tibble: 2 × 3
> key A B
> 
> 1 2 1 0
> 2NANA 1
> {code}
> Additionally, we should be able to keep both key columns with an option (cf 
> https://github.com/apache/arrow/blob/9719eae66dcf38c966ae769215d27020a6dd5550/r/R/dplyr-join.R#L32)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12693) [R] add unique() methods for ArrowTabluar, datasets

2022-07-18 Thread Sam Albers (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568149#comment-17568149
 ] 

Sam Albers commented on ARROW-12693:


Because this isn't a dplyr function, do you think this would automatically pull 
the vector into memory?

> [R] add unique() methods for ArrowTabluar, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
> #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
> #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
> #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
> #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
> #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
> #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4)
> #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
> #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
> #> 

[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568148#comment-17568148
 ] 

Antoine Pitrou commented on ARROW-17110:


The devtoolset backport won't do anything for the gcc 4.9 requirement on R 
Windows builds, I'm afraid.

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568147#comment-17568147
 ] 

Weston Pace commented on ARROW-17110:
-

It probably seems good for this ticket to focus on conda-forge but are the 
devtoolset backports a workable solution?  If so, it would be nice to update 
the minimum version as well.

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17112) [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on s390x

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17112:
---
Labels: pull-request-available  (was: )

> [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on 
> s390x
> -
>
> Key: ARROW-17112
> URL: https://issues.apache.org/jira/browse/ARROW-17112
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On big-endian plantform such as s390x, 
> {{TestArrowReaderWriter.testFileFooterSizeOverflow}} causes a failure.
> {code}
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestArrowReaderWriter.testFileFooterSizeOverflow:913 
> expected:<...alid footer length: [2147483647]> but was:<...alid footer 
> length: [-129]>
> [INFO] 
> [ERROR] Tests run: 610, Failures: 1, Errors: 0, Skipped: 4
> [INFO] 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Arrow Java Root POM 9.0.0-SNAPSHOT .. SUCCESS [  2.182 
> s]
> [INFO] Arrow Format ... SUCCESS [  0.995 
> s]
> [INFO] Arrow Memory ... SUCCESS [  0.761 
> s]
> [INFO] Arrow Memory - Core  SUCCESS [  1.582 
> s]
> [INFO] Arrow Memory - Unsafe .. SUCCESS [  1.600 
> s]
> [INFO] Arrow Memory - Netty ... SUCCESS [  1.966 
> s]
> [INFO] Arrow Vectors .. FAILURE [ 16.779 
> s]
> [INFO] Arrow Compression .. SKIPPED
> [INFO] Arrow Tools  SKIPPED
> [INFO] Arrow JDBC Adapter . SKIPPED
> [INFO] Arrow Plasma Client  SUCCESS [  1.171 
> s]
> [INFO] Arrow Flight ... SUCCESS [  0.741 
> s]
> [INFO] Arrow Flight Core .. SKIPPED
> [INFO] Arrow Flight GRPC .. SKIPPED
> [INFO] Arrow Flight SQL ... SKIPPED
> [INFO] Arrow Flight Integration Tests . SKIPPED
> [INFO] Arrow AVRO Adapter . SKIPPED
> [INFO] Arrow Algorithms ... SKIPPED
> [INFO] Arrow Performance Benchmarks 9.0.0-SNAPSHOT  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 23.584 s (Wall Clock)
> [INFO] Finished at: 2022-07-18T12:32:21Z
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) 
> on project arrow-vector: There are test failures.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17112) [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on s390x

2022-07-18 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated ARROW-17112:
-
Summary: [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a 
failure on s390x  (was: [Jafva] 
TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on s390x)

> [Java] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on 
> s390x
> -
>
> Key: ARROW-17112
> URL: https://issues.apache.org/jira/browse/ARROW-17112
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> On big-endian plantform such as s390x, 
> {{TestArrowReaderWriter.testFileFooterSizeOverflow}} causes a failure.
> {code}
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestArrowReaderWriter.testFileFooterSizeOverflow:913 
> expected:<...alid footer length: [2147483647]> but was:<...alid footer 
> length: [-129]>
> [INFO] 
> [ERROR] Tests run: 610, Failures: 1, Errors: 0, Skipped: 4
> [INFO] 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Arrow Java Root POM 9.0.0-SNAPSHOT .. SUCCESS [  2.182 
> s]
> [INFO] Arrow Format ... SUCCESS [  0.995 
> s]
> [INFO] Arrow Memory ... SUCCESS [  0.761 
> s]
> [INFO] Arrow Memory - Core  SUCCESS [  1.582 
> s]
> [INFO] Arrow Memory - Unsafe .. SUCCESS [  1.600 
> s]
> [INFO] Arrow Memory - Netty ... SUCCESS [  1.966 
> s]
> [INFO] Arrow Vectors .. FAILURE [ 16.779 
> s]
> [INFO] Arrow Compression .. SKIPPED
> [INFO] Arrow Tools  SKIPPED
> [INFO] Arrow JDBC Adapter . SKIPPED
> [INFO] Arrow Plasma Client  SUCCESS [  1.171 
> s]
> [INFO] Arrow Flight ... SUCCESS [  0.741 
> s]
> [INFO] Arrow Flight Core .. SKIPPED
> [INFO] Arrow Flight GRPC .. SKIPPED
> [INFO] Arrow Flight SQL ... SKIPPED
> [INFO] Arrow Flight Integration Tests . SKIPPED
> [INFO] Arrow AVRO Adapter . SKIPPED
> [INFO] Arrow Algorithms ... SKIPPED
> [INFO] Arrow Performance Benchmarks 9.0.0-SNAPSHOT  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 23.584 s (Wall Clock)
> [INFO] Finished at: 2022-07-18T12:32:21Z
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) 
> on project arrow-vector: There are test failures.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-8163) [C++][Dataset] Allow FileSystemDataset's file list to be lazy

2022-07-18 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568144#comment-17568144
 ] 

Weston Pace commented on ARROW-8163:


[~psolodovnikov] I have assigned the issue to you.  You should also now have 
the permission to assign issues to yourself.

> [C++][Dataset] Allow FileSystemDataset's file list to be lazy
> -
>
> Key: ARROW-8163
> URL: https://issues.apache.org/jira/browse/ARROW-8163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Pavel Solodovnikov
>Priority: Major
>  Labels: dataset
>
> A FileSystemDataset currently requires a full listing of files it contains on 
> construction, so a scan cannot start until all files in the dataset are 
> discovered. Instead it would be ideal if a large dataset could be constructed 
> with a lazy file listing so that scans can start immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-8163) [C++][Dataset] Allow FileSystemDataset's file list to be lazy

2022-07-18 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace reassigned ARROW-8163:
--

Assignee: Pavel Solodovnikov

> [C++][Dataset] Allow FileSystemDataset's file list to be lazy
> -
>
> Key: ARROW-8163
> URL: https://issues.apache.org/jira/browse/ARROW-8163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Pavel Solodovnikov
>Priority: Major
>  Labels: dataset
>
> A FileSystemDataset currently requires a full listing of files it contains on 
> construction, so a scan cannot start until all files in the dataset are 
> discovered. Instead it would be ideal if a large dataset could be constructed 
> with a lazy file listing so that scans can start immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-8163) [C++][Dataset] Allow FileSystemDataset's file list to be lazy

2022-07-18 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace reassigned ARROW-8163:
--

Assignee: (was: Weston Pace)

> [C++][Dataset] Allow FileSystemDataset's file list to be lazy
> -
>
> Key: ARROW-8163
> URL: https://issues.apache.org/jira/browse/ARROW-8163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Priority: Major
>  Labels: dataset
>
> A FileSystemDataset currently requires a full listing of files it contains on 
> construction, so a scan cannot start until all files in the dataset are 
> discovered. Instead it would be ideal if a large dataset could be constructed 
> with a lazy file listing so that scans can start immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-8163) [C++][Dataset] Allow FileSystemDataset's file list to be lazy

2022-07-18 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace reassigned ARROW-8163:
--

Assignee: Weston Pace

> [C++][Dataset] Allow FileSystemDataset's file list to be lazy
> -
>
> Key: ARROW-8163
> URL: https://issues.apache.org/jira/browse/ARROW-8163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Weston Pace
>Priority: Major
>  Labels: dataset
>
> A FileSystemDataset currently requires a full listing of files it contains on 
> construction, so a scan cannot start until all files in the dataset are 
> discovered. Instead it would be ideal if a large dataset could be constructed 
> with a lazy file listing so that scans can start immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17112) [Jafva] TestArrowReaderWriter.testFileFooterSizeOverflow causes a failure on s390x

2022-07-18 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-17112:


 Summary: [Jafva] TestArrowReaderWriter.testFileFooterSizeOverflow 
causes a failure on s390x
 Key: ARROW-17112
 URL: https://issues.apache.org/jira/browse/ARROW-17112
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 9.0.0
Reporter: Kazuaki Ishizaki


On big-endian plantform such as s390x, 
{{TestArrowReaderWriter.testFileFooterSizeOverflow}} causes a failure.

{code}
[INFO] Results:

[INFO] 

[ERROR] Failures: 

[ERROR]   TestArrowReaderWriter.testFileFooterSizeOverflow:913 
expected:<...alid footer length: [2147483647]> but was:<...alid footer length: 
[-129]>

[INFO] 

[ERROR] Tests run: 610, Failures: 1, Errors: 0, Skipped: 4

[INFO] 

[INFO] 

[INFO] Reactor Summary:

[INFO] 

[INFO] Apache Arrow Java Root POM 9.0.0-SNAPSHOT .. SUCCESS [  2.182 s]

[INFO] Arrow Format ... SUCCESS [  0.995 s]

[INFO] Arrow Memory ... SUCCESS [  0.761 s]

[INFO] Arrow Memory - Core  SUCCESS [  1.582 s]

[INFO] Arrow Memory - Unsafe .. SUCCESS [  1.600 s]

[INFO] Arrow Memory - Netty ... SUCCESS [  1.966 s]

[INFO] Arrow Vectors .. FAILURE [ 16.779 s]

[INFO] Arrow Compression .. SKIPPED

[INFO] Arrow Tools  SKIPPED

[INFO] Arrow JDBC Adapter . SKIPPED

[INFO] Arrow Plasma Client  SUCCESS [  1.171 s]

[INFO] Arrow Flight ... SUCCESS [  0.741 s]

[INFO] Arrow Flight Core .. SKIPPED

[INFO] Arrow Flight GRPC .. SKIPPED

[INFO] Arrow Flight SQL ... SKIPPED

[INFO] Arrow Flight Integration Tests . SKIPPED

[INFO] Arrow AVRO Adapter . SKIPPED

[INFO] Arrow Algorithms ... SKIPPED

[INFO] Arrow Performance Benchmarks 9.0.0-SNAPSHOT  SKIPPED

[INFO] 

[INFO] BUILD FAILURE

[INFO] 

[INFO] Total time: 23.584 s (Wall Clock)

[INFO] Finished at: 2022-07-18T12:32:21Z

[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) on 
project arrow-vector: There are test failures.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568140#comment-17568140
 ] 

David Li commented on ARROW-17110:
--

Thanks. So let's make this ticket, "Make all conda-forge based CI pipelines 
specify C++17"?

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12693) [R] add unique() methods for ArrowTabluar, datasets

2022-07-18 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568134#comment-17568134
 ] 

Neal Richardson commented on ARROW-12693:
-

I think we can use the same function as we use for {{distinct()}} now

> [R] add unique() methods for ArrowTabluar, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
> #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
> #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
> #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
> #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
> #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
> #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4)
> #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
> #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
> #> 
> #> [1] C:/Users/salbers

[jira] [Updated] (ARROW-12693) [R] add unique() methods for ArrowTabluar, datasets

2022-07-18 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-12693:

Summary: [R] add unique() methods for ArrowTabluar, datasets  (was: [R] 
Usage of compute functions - Use case of unique function)

> [R] add unique() methods for ArrowTabluar, datasets
> ---
>
> Key: ARROW-12693
> URL: https://issues.apache.org/jira/browse/ARROW-12693
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Sam Albers
>Priority: Minor
>
> I am trying to see if I can leverage `unique` on a Dataset object. Imagining 
> a much big dataset, I am trying to get away from this expensive pattern:
> {code:java}
> Dataset %>%
>   pull(col) %>%
>   unique(){code}
> However when I try the option below it is not working quite how I'd expect. 
> I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
> misunderstanding how these are meant to work. 
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> dir.create("iris")
> iris %>%
>  group_by(Species) %>%
>  write_dataset("iris")
> ds <- open_dataset("iris")
> ds %>%
>  mutate(unique = arrow_unique(Species)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression unique("setosa")
> ds %>%
>  mutate(unique = arrow_unique(Petal.Width)) %>%
>  collect()
> #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar 
> expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, 
> Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", 
> unique=unique(Petal.Width)}
> call_function("unique", ds, "Species")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("unique", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> call_function("mean", ds, "Petal.Width")
> #> Error: Argument 1 is of class FileSystemDataset but it must be one of 
> "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
> sessioninfo::session_info()
> #> - Session info 
> ---
> #> setting value 
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64 
> #> system x86_64, mingw32 
> #> ui RTerm 
> #> language (EN) 
> #> collate English_Canada.1252 
> #> ctype English_Canada.1252 
> #> tz America/Los_Angeles 
> #> date 2021-05-07 
> #> 
> #> - Packages 
> ---
> #> package * version date lib source 
> #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
> #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
> #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
> #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
> #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
> #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
> #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
> #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
> #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
> #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
> #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
> #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
> #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
> #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
> #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
> #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
> #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
> #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
> #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
> #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
> #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
> #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
> #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
> #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
> #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4)
> #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
> #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
> #> 
> #> [1] C:/Users/

[jira] [Comment Edited] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568126#comment-17568126
 ] 

H. Vetinari edited comment on ARROW-17110 at 7/18/22 5:19 PM:
--

> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C\+\+17. Just before 
> [because?] the minimum version for Arrow is C\+\+11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C\+\+17 (especially on 
windows) while you still try to compile with C\+\+11, breakage is 
all-but-guaranteed.

PS. I hate the JIRA text -parser- mangler with a burning passion :O


was (Author: h-vetinari):
> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C++17. Just before 
> [because?] the minimum version for Arrow is C++11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C++17 (especially on 
windows) while you still try to compile with C++11, breakage is 
all-but-guaranteed.

PS. I hate the JIRA text -parser- mangler with a burning passion :O

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568126#comment-17568126
 ] 

H. Vetinari edited comment on ARROW-17110 at 7/18/22 5:17 PM:
--

> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C++17. Just before 
> [because?] the minimum version for Arrow is C++11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C++17 (especially on 
windows) while you still try to compile with C++11, breakage is 
all-but-guaranteed.

PS. I hate the JIRA text -parser- mangler with a burning passion :O


was (Author: h-vetinari):
> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C\+\+17. Just before 
> [because?] the minimum version for Arrow is C\+\+11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C\+\+17 (especially on 
windows) while you still try to compile with C\+\+11, breakage is 
all-but-guaranteed.

PS. I hate the JIRA parser with a burning passion :O

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568126#comment-17568126
 ] 

H. Vetinari edited comment on ARROW-17110 at 7/18/22 5:16 PM:
--

> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C\+\+17. Just before 
> [because?] the minimum version for Arrow is C\+\+11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C\+\+17 (especially on 
windows) while you still try to compile with C\+\+11, breakage is 
all-but-guaranteed.

PS. I hate the JIRA parser with a burning passion :O


was (Author: h-vetinari):
> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C++17. Just before 
> [because?] the minimum version for Arrow is C++11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C++17 (especially on 
windows) while you still try to compile with C++11, breakage is 
all-but-guaranteed.

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16000) [C++][Dataset] Support Latin-1 encoding

2022-07-18 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568127#comment-17568127
 ] 

Weston Pace commented on ARROW-16000:
-

I agree with Antoine's suggestion of {{CsvFragmentReadOptions}}.  This is 
essentially the same problem we have with compression too right?  We can 
auto-detect compression based on the file extension but if the compression 
doesn't match the file extension (or the file extension doesn't indicate 
compression) we have no way of wrapping the stream with a decompression 
transform.  It sounds like this solution might solve both problems.

> [C++][Dataset] Support Latin-1 encoding
> ---
>
> Key: ARROW-16000
> URL: https://issues.apache.org/jira/browse/ARROW-16000
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Nicola Crane
>Priority: Major
>
> In ARROW-15992 a user is reporting issues with trying to read in files with 
> Latin-1 encoding.  I had a look through the docs for the Dataset API and I 
> don't think this is currently supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568126#comment-17568126
 ] 

H. Vetinari commented on ARROW-17110:
-

> Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
> where it is the default compiler. We do happen to get a number of bug reports 
> from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 
10) though... These are obviously available & in use for the manylinux images, 
and I think they're a very much acceptable requirement for users on such old 
platforms.

> I agree that for now conda-forge can simply build using C++17. Just before 
> [because?] the minimum version for Arrow is C++11 doesn't mean you are 
> forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C++17 (especially on 
windows) while you still try to compile with C++11, breakage is 
all-but-guaranteed.

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568100#comment-17568100
 ] 

Neal Richardson edited comment on ARROW-17110 at 7/18/22 4:41 PM:
--

Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
where it is the default compiler. We do happen to get a number of bug reports 
from R users on CentOS 7, though.

R 3.6 on Windows used an odd gcc 4.9 mingw compiler, and that's the main source 
of "R requires an old compiler". But we already disable many features on R < 
4.0 on Windows, and conditionally disabling more is not a problem. (GCS 
filesystem support, which uses abseil, is one of those already.) We could drop 
support for R 3.6 now, but since we can just disable features on the build, we 
haven't been forced to do so yet.

CRAN checks are now all running gcc 8 or newer: 
https://cran.r-project.org/web/checks/check_flavors.html

We have CI that builds arrow on C\+\+17 (and maybe also 14?). I think Homebrew 
also bumped up building arrow with C\+\+17 to match abseil (or maybe that's 
still in the copy of the formula we test in apache/arrow). We also have an open 
PR to add Azure Blob Storage, which will require C\+\+14: 
https://github.com/apache/arrow/pull/12914/files#r899724290. So maybe the 
solution for the abseil issue is to require the newer C\+\+ standard if using 
abseil built with it?




was (Author: npr):
Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
where it is the default compiler. We do happen to get a number of bug reports 
from R users on CentOS 7, though.

R 3.6 on Windows used an odd gcc 4.9 mingw compiler, and that's the main source 
of "R requires an old compiler". But we already disable many features on R < 
4.0 on Windows, and conditionally disabling more is not a problem. (GCS 
filesystem support, which uses abseil, is one of those already.) We could drop 
support for R 3.6 now, but since we can just disable features on the build, we 
haven't been forced to do so yet.

CRAN checks are now all running gcc 8 or newer: 
https://cran.r-project.org/web/checks/check_flavors.html

We have CI that builds arrow on C++17 (and maybe also 14?). I think Homebrew 
also bumped up building arrow with C++17 to match abseil (or maybe that's still 
in the copy of the formula we test in apache/arrow). We also have an open PR to 
add Azure Blob Storage, which will require C++14: 
https://github.com/apache/arrow/pull/12914/files#r899724290. So maybe the 
solution for the abseil issue is to require the newer C++ standard if using 
abseil built with it?



> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568111#comment-17568111
 ] 

Antoine Pitrou edited comment on ARROW-17110 at 7/18/22 4:39 PM:
-

See also ARROW-12816.

I agree that for now conda-forge can simply build using C\+\+17. Just before 
the minimum version for Arrow is C\+\+11 doesn't mean you are forbidden to use 
a newer one :-D


was (Author: pitrou):
See also ARROW-12816.

I agree that for now conda-forge can simply build using C++17. Just before the 
minimum version for Arrow is C++11 doesn't mean you are forbidden to use a 
newer one :-D

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568111#comment-17568111
 ] 

Antoine Pitrou commented on ARROW-17110:


See also ARROW-12816.

I agree that for now conda-forge can simply build using C++17. Just before the 
minimum version for Arrow is C++11 doesn't mean you are forbidden to use a 
newer one :-D

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15538) [C++] Create mapping from Substrait "standard functions" to Arrow equivalents

2022-07-18 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace reassigned ARROW-15538:
---

Assignee: Weston Pace

> [C++] Create mapping from Substrait "standard functions" to Arrow equivalents
> -
>
> Key: ARROW-15538
> URL: https://issues.apache.org/jira/browse/ARROW-15538
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
>  Labels: substrait
>
> Substrait has a number of "stock" functions defined here: 
> https://github.com/substrait-io/substrait/tree/main/extensions
> This is basically a set of standard extensions.
> We should map these functions to the equivalent Arrow functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568107#comment-17568107
 ] 

David Li commented on ARROW-17110:
--

Thanks Neal - yeah I realize I read that backwards now. If we just need to 
build with 17 on conda-forge/when using newer Abseil that shouldn't be a 
problem (we'd have to update various pipelines/scripts), we just can't raise 
our minimum supported version.

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568100#comment-17568100
 ] 

Neal Richardson commented on ARROW-17110:
-

Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, 
where it is the default compiler. We do happen to get a number of bug reports 
from R users on CentOS 7, though.

R 3.6 on Windows used an odd gcc 4.9 mingw compiler, and that's the main source 
of "R requires an old compiler". But we already disable many features on R < 
4.0 on Windows, and conditionally disabling more is not a problem. (GCS 
filesystem support, which uses abseil, is one of those already.) We could drop 
support for R 3.6 now, but since we can just disable features on the build, we 
haven't been forced to do so yet.

CRAN checks are now all running gcc 8 or newer: 
https://cran.r-project.org/web/checks/check_flavors.html

We have CI that builds arrow on C++17 (and maybe also 14?). I think Homebrew 
also bumped up building arrow with C++17 to match abseil (or maybe that's still 
in the copy of the formula we test in apache/arrow). We also have an open PR to 
add Azure Blob Storage, which will require C++14: 
https://github.com/apache/arrow/pull/12914/files#r899724290. So maybe the 
solution for the abseil issue is to require the newer C++ standard if using 
abseil built with it?



> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16802) [Docs] Improve Acero Documentation

2022-07-18 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568089#comment-17568089
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-16802:
--

[~willjones127] thanks for raising this issue. I will work on improving the 
documentation. 

> [Docs] Improve Acero Documentation
> --
>
> Key: ARROW-16802
> URL: https://issues.apache.org/jira/browse/ARROW-16802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Will Jones
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> From [~amol-] :
> {quote}If we want to start promoting Acero to the world, I think we should 
> work on improving a bit the documentation first. Having a blog post that then 
> redirects people to a docs that they find hard to read/apply might actually 
> be counterproductive as it might create a fame of being badly documented.
> At the moment the only mention of it is 
> [https://arrow.apache.org/docs/cpp/streaming_execution.html] and it's not 
> very easy to follow (not much explainations, just blocks of code). In 
> comparison if you look at the compute chapter in Python ( 
> [https://arrow.apache.org/docs/dev/python/compute.html] ) it's much more 
> talkative and explains things as it goes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16802) [Docs] Improve Acero Documentation

2022-07-18 Thread Vibhatha Lakmal Abeykoon (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vibhatha Lakmal Abeykoon reassigned ARROW-16802:


Assignee: Vibhatha Lakmal Abeykoon

> [Docs] Improve Acero Documentation
> --
>
> Key: ARROW-16802
> URL: https://issues.apache.org/jira/browse/ARROW-16802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Will Jones
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> From [~amol-] :
> {quote}If we want to start promoting Acero to the world, I think we should 
> work on improving a bit the documentation first. Having a blog post that then 
> redirects people to a docs that they find hard to read/apply might actually 
> be counterproductive as it might create a fame of being badly documented.
> At the moment the only mention of it is 
> [https://arrow.apache.org/docs/cpp/streaming_execution.html] and it's not 
> very easy to follow (not much explainations, just blocks of code). In 
> comparison if you look at the compute chapter in Python ( 
> [https://arrow.apache.org/docs/dev/python/compute.html] ) it's much more 
> talkative and explains things as it goes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568066#comment-17568066
 ] 

H. Vetinari edited comment on ARROW-17110 at 7/18/22 3:38 PM:
--

> We still have to support down to GCC 4.8 (for some older R versions at least) 

Where is the lower bound of R support defined (and how/why)? I tried looking 
but couldn't find anything. I think it'd be a good idea to define some sort of 
support policy (note, we did this in scipy over the last year or so, allowing 
us to move from 4.8 to 6.x and [now|https://github.com/scipy/scipy/pull/16589] 
to 8.x).

> And even then I think C++14 will be the highest attainable version

Barring some progress on the lower bounds for compilers, you'll then be limited 
from upgrading abseil beyond 20220623.0 (and that's already working more or 
less by accident since 20211102 on unix is compiled with C\+\+17 in c-f. Unless 
we introduce multiple builds per CXX version in conda-forge, this problem will 
only get worse, because once vc142 becomes the minimum toolchain in the not too 
distant future, c-f can move to C\+\+17 also on windows globally).


was (Author: h-vetinari):
> We still have to support down to GCC 4.8 (for some older R versions at least) 

Where is the lower bound of R support defined (and how/why)? I tried looking 
but couldn't find anything. I think it'd be a good idea to define some sort of 
support policy (note, we did this in scipy over the last year or so, allowing 
us to move from 4.8 to 6.x and [now|https://github.com/scipy/scipy/pull/16589] 
to 8.x).

> And even then I think C++14 will be the highest attainable version

Barring some progress on the lower bounds for compilers, you'll then be limited 
from upgrading abseil beyond 20220623.0 (and that's already working more or 
less by accident since 20211102 on unix is compiled with C++17 in c-f. Unless 
we introduce multiple builds per CXX version in conda-forge, this problem will 
only get worse, because once vc142 becomes the minimum toolchain in the not too 
distant future, c-f can move to C++17 also on windows globally).

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16802) [Docs] Improve Acero Documentation

2022-07-18 Thread Will Jones (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568067#comment-17568067
 ] 

Will Jones commented on ARROW-16802:


Hi Kexin,

Most communication about Acero usage happens on the general Arrow user mailing 
list. You can sign up at: [https://arrow.apache.org/community/] We don't have 
any Acero-specific channels, but maybe that will one day change.

> [Docs] Improve Acero Documentation
> --
>
> Key: ARROW-16802
> URL: https://issues.apache.org/jira/browse/ARROW-16802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Will Jones
>Priority: Major
>
> From [~amol-] :
> {quote}If we want to start promoting Acero to the world, I think we should 
> work on improving a bit the documentation first. Having a blog post that then 
> redirects people to a docs that they find hard to read/apply might actually 
> be counterproductive as it might create a fame of being badly documented.
> At the moment the only mention of it is 
> [https://arrow.apache.org/docs/cpp/streaming_execution.html] and it's not 
> very easy to follow (not much explainations, just blocks of code). In 
> comparison if you look at the compute chapter in Python ( 
> [https://arrow.apache.org/docs/dev/python/compute.html] ) it's much more 
> talkative and explains things as it goes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568066#comment-17568066
 ] 

H. Vetinari commented on ARROW-17110:
-

> We still have to support down to GCC 4.8 (for some older R versions at least) 

Where is the lower bound of R support defined (and how/why)? I tried looking 
but couldn't find anything. I think it'd be a good idea to define some sort of 
support policy (note, we did this in scipy over the last year or so, allowing 
us to move from 4.8 to 6.x and [now|https://github.com/scipy/scipy/pull/16589] 
to 8.x).

> And even then I think C++14 will be the highest attainable version

Barring some progress on the lower bounds for compilers, you'll then be limited 
from upgrading abseil beyond 20220623.0 (and that's already working more or 
less by accident since 20211102 on unix is compiled with C++17 in c-f. Unless 
we introduce multiple builds per CXX version in conda-forge, this problem will 
only get worse, because once vc142 becomes the minimum toolchain in the not too 
distant future, c-f can move to C++17 also on windows globally).

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17111) [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due to missing libre2

2022-07-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-17111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568064#comment-17568064
 ] 

Raúl Cumplido commented on ARROW-17111:
---

cc ~ [~kou] 

> [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due 
> to missing libre2
> --
>
> Key: ARROW-17111
> URL: https://issues.apache.org/jira/browse/ARROW-17111
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Raúl Cumplido
>Priority: Critical
>  Labels: Nightly
> Fix For: 9.0.0
>
>
> The following nightly packaging jobs have been failing:
> [almalinux-9-amd64|https://github.com/ursacomputing/crossbow/runs/7385779728?check_suite_focus=true]
> [almalinux-9-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/25327#L5812]
> [centos-9-stream-amd64|https://github.com/ursacomputing/crossbow/runs/7385764133?check_suite_focus=true]
> [centos-9-stream-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/253299972#L6029]
> It errors when installing with dnf. It seems due to not finding libre2:
> {code:java}
>  + dnf install -y --enablerepo=crb --enablerepo=epel 
> arrow-devel-9.0.0.dev405-1.el9
> Apache Arrow for AlmaLinux 9 - aarch64          2.6 MB/s |  25 kB     00:00   
>  
> Extra Packages for Enterprise Linux 9 - aarch64  15 MB/s | 8.3 MB     00:00   
>  
> Error: 
>  Problem: package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires 
> libarrow.so.900()(64bit), but none of the providers can be installed
>   - package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires arrow9-libs = 
> 9.0.0.dev405-1.el9, but none of the providers can be installed
>   - conflicting requests
>   - nothing provides libre2.so.0a()(64bit) needed by 
> arrow9-libs-9.0.0.dev405-1.el9.aarch64
> (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to 
> use not only best candidate packages)
> rake aborted!
> Command failed with status (1): [docker run --log-driver none --rm 
> --securi...]{code}
> We have lately upgraded some vendored versions like RE2: 
> [https://github.com/apache/arrow/pull/13570]
> or rapidjson: [https://github.com/apache/arrow/pull/13608]
> The jobs started failing since the rapidjson one was merged. In case is 
> related:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-11841) [R][C++] Allow cancelling long-running commands

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11841:
---
Labels: pull-request-available  (was: )

> [R][C++] Allow cancelling long-running commands
> ---
>
> Key: ARROW-11841
> URL: https://issues.apache.org/jira/browse/ARROW-11841
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, R
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When calling a long-running task (for example reading a CSV file) from the R 
> prompt, users may want to interrupt with Ctrl-C.
> Allowing this will require integrating R's user interruption facility with 
> the cancellation API that's going to be exposed in C++ (see  ARROW-8732).
> Below some information I've gathered on the topic:
> There is some hairy discussion of how to interrupt C++ code from R at 
> https://stackoverflow.com/questions/40563522/r-how-to-write-interruptible-c-function-and-recover-partial-results
>  and https://stat.ethz.ch/pipermail/r-devel/2011-April/060714.html .
> It seems it may involve polling cpp11::check_user_interrupt() and catching 
> any cpp11::unwind_exception that may signal an interruption. A complication 
> is that apparently R APIs should only be called from the main thread. There's 
> also a small library which claims to make writing all this easier: 
> https://github.com/tnagler/RcppThread/blob/master/inst/include/RcppThread/RMonitor.hpp
> But since user interruptions will only be noticed by the R main thread, the 
> solution may be to launch heavy computations (e.g. CSV reading) in a separate 
> thread and have the main R thread periodically poll for interrupts while 
> waiting for the separate thread. This is what this dedicated thread class 
> does in its join method: 
> https://github.com/tnagler/RcppThread/blob/master/inst/include/RcppThread/Thread.hpp#L79



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17111) [CI][Packaging] Packaging almalinux 9 and centos 9 fail installing arrow due to missing libre2

2022-07-18 Thread Jira
Raúl Cumplido created ARROW-17111:
-

 Summary: [CI][Packaging] Packaging almalinux 9 and centos 9 fail 
installing arrow due to missing libre2
 Key: ARROW-17111
 URL: https://issues.apache.org/jira/browse/ARROW-17111
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Raúl Cumplido
 Fix For: 9.0.0


The following nightly packaging jobs have been failing:

[almalinux-9-amd64|https://github.com/ursacomputing/crossbow/runs/7385779728?check_suite_focus=true]
[almalinux-9-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/25327#L5812]
[centos-9-stream-amd64|https://github.com/ursacomputing/crossbow/runs/7385764133?check_suite_focus=true]
[centos-9-stream-arm64|https://app.travis-ci.com/github/ursacomputing/crossbow/builds/253299972#L6029]

It errors when installing with dnf. It seems due to not finding libre2:
{code:java}
 + dnf install -y --enablerepo=crb --enablerepo=epel 
arrow-devel-9.0.0.dev405-1.el9
Apache Arrow for AlmaLinux 9 - aarch64          2.6 MB/s |  25 kB     00:00    
Extra Packages for Enterprise Linux 9 - aarch64  15 MB/s | 8.3 MB     00:00    
Error: 
 Problem: package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires 
libarrow.so.900()(64bit), but none of the providers can be installed
  - package arrow-devel-9.0.0.dev405-1.el9.aarch64 requires arrow9-libs = 
9.0.0.dev405-1.el9, but none of the providers can be installed
  - conflicting requests
  - nothing provides libre2.so.0a()(64bit) needed by 
arrow9-libs-9.0.0.dev405-1.el9.aarch64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use 
not only best candidate packages)
rake aborted!
Command failed with status (1): [docker run --log-driver none --rm 
--securi...]{code}
We have lately upgraded some vendored versions like RE2: 
[https://github.com/apache/arrow/pull/13570]
or rapidjson: [https://github.com/apache/arrow/pull/13608]

The jobs started failing since the rapidjson one was merged. In case is related:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17078) [C++] Cleaning up C++ Examples

2022-07-18 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li resolved ARROW-17078.
--
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13598
[https://github.com/apache/arrow/pull/13598]

> [C++] Cleaning up C++ Examples
> --
>
> Key: ARROW-17078
> URL: https://issues.apache.org/jira/browse/ARROW-17078
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Cleaning up the usage of a custom macro usage in examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568057#comment-17568057
 ] 

David Li commented on ARROW-17107:
--

Yeah, (dense) union gets to be special as usual.

That said, your fix is basically there and we should just add tests and make 
sure large list/binary/utf8 get covered.

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread James Henderson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568056#comment-17568056
 ] 

James Henderson commented on ARROW-17107:
-

> All vectors that use offsets must have at least one offset (or more 
> specifically: the number of offsets is always the number of values + 1, see 
> [the 
> spec|https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout])

mm, although this isn't the case for dense unions - I guess their 'offsets' are 
conceptually different from the offsets in the variable-width vectors?

 

> possibly empty vectors may not have allocated any memory as a 
> micro-optimization?

that was my assumption too, yeah :)

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17110) [C++] Move away from C++11

2022-07-18 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-17110:
-
Summary: [C++] Move away from C++11  (was: Move away from C++11)

> [C++] Move away from C++11
> --
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17110) Move away from C++11

2022-07-18 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-17110:
-
Component/s: C++

> Move away from C++11
> 
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17110) Move away from C++11

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568055#comment-17568055
 ] 

David Li commented on ARROW-17110:
--

We still have to support down to GCC 4.8 (for some older R versions at least) 
so this will be a while coming. And even then I think C++14 will be the highest 
attainable version

> Move away from C++11
> 
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (ARROW-14895) [C++] Vcpkg install error for abseil on windows when building Arrow C++

2022-07-18 Thread H. Vetinari (Jira)


[ https://issues.apache.org/jira/browse/ARROW-14895 ]


H. Vetinari deleted comment on ARROW-14895:
-

was (Author: h-vetinari):
This is probably related to using different C++ standard versions while 
compiling abseil & arrow. Abseil is a library whose ABI depends on the 
standard, and google only supports the case where everything is compiled 
against the same C++ version

> [C++] Vcpkg install error for abseil on windows when building Arrow C++
> ---
>
> Key: ARROW-14895
> URL: https://issues.apache.org/jira/browse/ARROW-14895
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows 10
>Reporter: Akhil J Nair
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 0h
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I attempted to build arrow c++ by following the 
> [docs|https://arrow.apache.org/docs/developers/cpp/building.html] . However 
> in the vcpkg install command
> {code:java}
> vcpkg install \
>   --x-manifest-root cpp \
>   --feature-flags=versions \
>   --clean-after-build{code}
> I get this error - 
> {code:java}
> Error: abseil:x86-windows@20210324.2 is only supported on '(x64 | arm64) & 
> (linux | osx | windows)'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17110) Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)
H. Vetinari created ARROW-17110:
---

 Summary: Move away from C++11
 Key: ARROW-17110
 URL: https://issues.apache.org/jira/browse/ARROW-17110
 Project: Apache Arrow
  Issue Type: Task
Reporter: H. Vetinari


The upcoming abseil release has dropped support for C++11, so {_}eventually{_}, 
arrow will have to follow. More details 
[here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].

Relatedly, when I 
[tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
abseil to a newer C++ version on windows, things apparently broke in arrow CI. 
This is because the ABI of abseil is sensitive to the C++ standard that's used 
to compile, and google only supports a homogeneous version to compile all 
artefacts in a stack. This creates some friction with conda-forge (where the 
compilers are generally much newer than what arrow might be willing to impose). 
For now, things seems to have worked out with arrow 
[specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
 C++11 while conda-forge moved to C++17 - at least on unix, but windows was not 
so lucky.

Perhaps people would therefore also be interested in collaborating (or at least 
commenting on) this 
[issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
should permit more flexibility by being able to opt into given standard 
versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17110) Move away from C++11

2022-07-18 Thread H. Vetinari (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

H. Vetinari updated ARROW-17110:

Description: 
The upcoming abseil release has dropped support for C++11, so {_}eventually{_}, 
arrow will have to follow. More details 
[here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].

Relatedly, when I 
[tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
abseil to a newer C++ version on windows, things apparently broke in arrow CI. 
This is because the ABI of abseil is sensitive to the C++ standard that's used 
to compile, and google only supports a homogeneous version to compile all 
artefacts in a stack. This creates some friction with conda-forge (where the 
compilers are generally much newer than what arrow might be willing to impose). 
For now, things seems to have worked out with arrow 
[specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
 C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows was 
not so lucky.

Perhaps people would therefore also be interested in collaborating (or at least 
commenting on) this 
[issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
should permit more flexibility by being able to opt into given standard 
versions also from conda-forge.

  was:
The upcoming abseil release has dropped support for C++11, so {_}eventually{_}, 
arrow will have to follow. More details 
[here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].

Relatedly, when I 
[tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
abseil to a newer C++ version on windows, things apparently broke in arrow CI. 
This is because the ABI of abseil is sensitive to the C++ standard that's used 
to compile, and google only supports a homogeneous version to compile all 
artefacts in a stack. This creates some friction with conda-forge (where the 
compilers are generally much newer than what arrow might be willing to impose). 
For now, things seems to have worked out with arrow 
[specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
 C++11 while conda-forge moved to C++17 - at least on unix, but windows was not 
so lucky.

Perhaps people would therefore also be interested in collaborating (or at least 
commenting on) this 
[issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
should permit more flexibility by being able to opt into given standard 
versions also from conda-forge.


> Move away from C++11
> 
>
> Key: ARROW-17110
> URL: https://issues.apache.org/jira/browse/ARROW-17110
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: H. Vetinari
>Priority: Major
>
> The upcoming abseil release has dropped support for C++11, so 
> {_}eventually{_}, arrow will have to follow. More details 
> [here|https://github.com/conda-forge/abseil-cpp-feedstock/issues/37].
> Relatedly, when I 
> [tried|https://github.com/conda-forge/abseil-cpp-feedstock/pull/25] to switch 
> abseil to a newer C++ version on windows, things apparently broke in arrow 
> CI. This is because the ABI of abseil is sensitive to the C++ standard that's 
> used to compile, and google only supports a homogeneous version to compile 
> all artefacts in a stack. This creates some friction with conda-forge (where 
> the compilers are generally much newer than what arrow might be willing to 
> impose). For now, things seems to have worked out with arrow 
> [specifying|https://github.com/apache/arrow/blob/897a4c0ce73c3fe07872beee2c1d2128e44f6dd4/cpp/cmake_modules/SetupCxxFlags.cmake#L121-L124]
>  C\+\+11 while conda-forge moved to C\+\+17 - at least on unix, but windows 
> was not so lucky.
> Perhaps people would therefore also be interested in collaborating (or at 
> least commenting on) this 
> [issue|https://github.com/conda-forge/abseil-cpp-feedstock/issues/29], which 
> should permit more flexibility by being able to opt into given standard 
> versions also from conda-forge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17109) [C++] Clean up includes in exec_plan.h

2022-07-18 Thread David Li (Jira)
David Li created ARROW-17109:


 Summary: [C++] Clean up includes in exec_plan.h
 Key: ARROW-17109
 URL: https://issues.apache.org/jira/browse/ARROW-17109
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


Notably, it includes logging.h transitively via exec/util.h which we should 
avoid. We should perhaps add to/create an arrow/compute/exec/type_fwd.h



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17108) [Python] Stop skipping dask tests on integration tests

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17108:
---
Labels: pull-request-available  (was: )

> [Python] Stop skipping dask tests on integration tests
> --
>
> Key: ARROW-17108
> URL: https://issues.apache.org/jira/browse/ARROW-17108
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There were some tests on our nightly dask integration job being skipped due 
> to old issues on dask. The original issues:
> https://issues.apache.org/jira/browse/ARROW-15720
> https://issues.apache.org/jira/browse/ARROW-9353
> This issues have been solved for some time:
> [https://github.com/dask/dask/issues/6243]
> [https://github.com/dask/dask/issues/6374]
> We should stop skipping the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17108) [Python] Stop skipping dask tests on integration tests

2022-07-18 Thread Jira
Raúl Cumplido created ARROW-17108:
-

 Summary: [Python] Stop skipping dask tests on integration tests
 Key: ARROW-17108
 URL: https://issues.apache.org/jira/browse/ARROW-17108
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Raúl Cumplido
Assignee: Raúl Cumplido


There were some tests on our nightly dask integration job being skipped due to 
old issues on dask. The original issues:

https://issues.apache.org/jira/browse/ARROW-15720

https://issues.apache.org/jira/browse/ARROW-9353

This issues have been solved for some time:

[https://github.com/dask/dask/issues/6243]

[https://github.com/dask/dask/issues/6374]

We should stop skipping the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568051#comment-17568051
 ] 

David Li commented on ARROW-17107:
--

All vectors that use offsets must have at least one offset (or more 
specifically: the number of offsets is always the number of values + 1, see 
[the 
spec|https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout]),
 so it should account for large/regular binary, utf8, and list vectors. It 
looks like originally it only accounted for regular binary/utf8 and as you 
found, we need to cover lists, and then we should cover large binary/utf8/list 
as well. (That said I wonder why the check is even needed, given that the 
vectors should already follow the spec; possibly empty vectors may not have 
allocated any memory as a micro-optimization?)

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread James Henderson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568050#comment-17568050
 ] 

James Henderson edited comment on ARROW-17107 at 7/18/22 2:56 PM:
--

In the writer, it seems (to my untrained eye, at least) like that buffer only 
exists to write a zero out, so maybe doesn't matter?


was (Author: jarohen):
In the writer, at least, it seems to my untrained eye like that buffer only 
exists to write a zero out, so maybe doesn't matter?

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread James Henderson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568050#comment-17568050
 ] 

James Henderson edited comment on ARROW-17107 at 7/18/22 2:56 PM:
--

In the writer, it seems (to my untrained eye, at least) like that buffer only 
exists to write a JSON zero out, so maybe doesn't matter?


was (Author: jarohen):
In the writer, it seems (to my untrained eye, at least) like that buffer only 
exists to write a zero out, so maybe doesn't matter?

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread James Henderson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568050#comment-17568050
 ] 

James Henderson commented on ARROW-17107:
-

In the writer, at least, it seems to my untrained eye like that buffer only 
exists to write a zero out, so maybe doesn't matter?

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14895) [C++] Vcpkg install error for abseil on windows when building Arrow C++

2022-07-18 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568048#comment-17568048
 ] 

H. Vetinari commented on ARROW-14895:
-

This is probably related to using different C++ standard versions while 
compiling abseil & arrow. Abseil is a library whose ABI depends on the 
standard, and google only supports the case where everything is compiled 
against the same C++ version

> [C++] Vcpkg install error for abseil on windows when building Arrow C++
> ---
>
> Key: ARROW-14895
> URL: https://issues.apache.org/jira/browse/ARROW-14895
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows 10
>Reporter: Akhil J Nair
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 0h
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I attempted to build arrow c++ by following the 
> [docs|https://arrow.apache.org/docs/developers/cpp/building.html] . However 
> in the vcpkg install command
> {code:java}
> vcpkg install \
>   --x-manifest-root cpp \
>   --feature-flags=versions \
>   --clean-after-build{code}
> I get this error - 
> {code:java}
> Error: abseil:x86-windows@20210324.2 is only supported on '(x64 | arm64) & 
> (linux | osx | windows)'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568047#comment-17568047
 ] 

David Li commented on ARROW-17107:
--

Ah ok - just want to make sure that's clear!

That fix doesn't look like it would work if we have a "large" variant of the 
vector - since the offsets are 8 bytes not 4 bytes as that branch assumes. But 
indeed, it seems the check should include all types with offsets, not just 
binary/utf8. (And actually, that means it probably doesn't handle large 
variants correctly at all…)

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-17107) [Java] JSONFileWriter throws IOOBE writing an empty list

2022-07-18 Thread James Henderson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568045#comment-17568045
 ] 

James Henderson edited comment on ARROW-17107 at 7/18/22 2:52 PM:
--

> it's actually there to generate an internal format for our integration test

[~lidavidm] yep, same for us!


was (Author: jarohen):
[~lidavidm] yep, same for us!

> [Java] JSONFileWriter throws IOOBE writing an empty list
> 
>
> Key: ARROW-17107
> URL: https://issues.apache.org/jira/browse/ARROW-17107
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 8.0.0
>Reporter: James Henderson
>Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >