[jira] [Created] (ARROW-17886) [R] Convert schema to the corresponding ptype (zero-row data frame)?

2022-09-28 Thread Jira
Kirill Müller created ARROW-17886:
-

 Summary: [R] Convert schema to the corresponding ptype (zero-row 
data frame)?
 Key: ARROW-17886
 URL: https://issues.apache.org/jira/browse/ARROW-17886
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Kirill Müller


When fetching data e.g. from a RecordBatchReader, I would like to know, ahead 
of time, what the data will look like after it's converted to a data frame. I 
have found a way using utils::head(0), but I'm not sure if it's efficient in 
all scenarios.

My use case is the Arrow extension to DBI, in particular the default 
implementation for drivers that don't speak Arrow yet. I'd like to know which 
types the columns should have on the database. I can already infer this from 
the corresponding R types, but those existing drivers don't know about Arrow 
types.

Should we support as.data.frame() for schema objects? The semantics would be to 
return a zero-row data frame with correct column names and types.


library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#> timestamp

data <- data.frame(
  a = 1:3,
  b = 2.5,
  c = "three",
  stringsAsFactors = FALSE
)
data$d <- blob::blob(as.raw(1:10))

tbl <- arrow::as_arrow_table(data)
rbr <- arrow::as_record_batch_reader(tbl)

tibble::as_tibble(head(rbr, 0))
#> # A tibble: 0 × 4
#> # … with 4 variables: a , b , c , d 
rbr$read_table()
#> Table
#> 3 rows x 4 columns
#> $a 
#> $b 
#> $c 
#> $d <>
#> 
#> See $metadata for additional Schema metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17885) Return BLOB data as list of raw instead of a list of integers

2022-09-28 Thread Jira
Kirill Müller created ARROW-17885:
-

 Summary: Return BLOB data as list of raw instead of a list of 
integers
 Key: ARROW-17885
 URL: https://issues.apache.org/jira/browse/ARROW-17885
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 10.0.0, 9.0.1
 Environment: macOS, R 4.1.3
Reporter: Kirill Müller


BLOBs should be mapped to lists of raw in R, not lists of integer. Tested with 
ec714db3995549309b987fc8112db98bb93102d0.

``` r
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

data <- data.frame(
  a = 1:3,
  b = 2.5,
  c = "three",
  stringsAsFactors = FALSE
)
data$d <- blob::blob(as.raw(1:10))

tbl <- arrow::as_arrow_table(data)
rbr <- arrow::as_record_batch_reader(tbl)

waldo::compare(as.data.frame(rbr$read_next_batch()), data)
#> `old$d[[1]]` is an integer vector (1, 2, 3, 4, 5, ...)
#> `new$d[[1]]` is a raw vector (01, 02, 03, 04, 05, ...)
#> 
#> `old$d[[2]]` is an integer vector (1, 2, 3, 4, 5, ...)
#> `new$d[[2]]` is a raw vector (01, 02, 03, 04, 05, ...)
#> 
#> `old$d[[3]]` is an integer vector (1, 2, 3, 4, 5, ...)
#> `new$d[[3]]` is a raw vector (01, 02, 03, 04, 05, ...)
```

Created on 2022-09-29 with [reprex 
v2.0.2](https://reprex.tidyverse.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17884) Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-09-28 Thread zhaoyaqi (Jira)
zhaoyaqi created ARROW-17884:


 Summary: Add Intel®-IAA/QPL-based Parquet RLE Decode
 Key: ARROW-17884
 URL: https://issues.apache.org/jira/browse/ARROW-17884
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: zhaoyaqi


Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator 
available in the upcoming generation of Intel® Xeon® Scalable processors 
("Sapphire Rapids"). Its goal is to speed up common operations in analytics 
like data (de)compression and filtering. It support decoding of Parquet RLE 
format. We add new codec which utilizes the Intel® IAA offloading technology to 
provide a high-performance RLE decode implementation. The codec uses the 
[Intel® Query Processing Library (QPL)|https://github.com/intel/qpl] which 
abstracts access to the hardware accelerator. The new solution provides in 
general higher performance against current solution, and also consume less CPU.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17883) [Java] Implement an immutable table object

2022-09-28 Thread Larry White (Jira)
Larry White created ARROW-17883:
---

 Summary: [Java] Implement an immutable table object
 Key: ARROW-17883
 URL: https://issues.apache.org/jira/browse/ARROW-17883
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 10.0.0
Reporter: Larry White


Implement an immutable Table object without the batch semantics provided by 
VectorSchemaRoot. 

See original design document/discussion here: 
https://docs.google.com/document/d/1J77irZFWNnSID7vK71z26Nw_Pi99I9Hb9iryno8B03c/edit?usp=sharing

Note that this ticket covers only the immutable Table implementation. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17882) [Java][Doc] Document build & use of new artifact on Windows environment

2022-09-28 Thread David Dali Susanibar Arce (Jira)
David Dali Susanibar Arce created ARROW-17882:
-

 Summary: [Java][Doc] Document build & use of new artifact on 
Windows environment
 Key: ARROW-17882
 URL: https://issues.apache.org/jira/browse/ARROW-17882
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Documentation, Java
Reporter: David Dali Susanibar Arce
Assignee: David Dali Susanibar Arce


* Update build documentation with new Windows JNI DLL support
 * Update use documentation with new Windows JNI DLL support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17881) [C++] Not able to build the project with the latest commit of the master branch

2022-09-28 Thread Anirudh Acharya (Jira)
Anirudh Acharya created ARROW-17881:
---

 Summary: [C++] Not able to build the project with the latest 
commit of the master branch
 Key: ARROW-17881
 URL: https://issues.apache.org/jira/browse/ARROW-17881
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Anirudh Acharya


I am trying to build the arrow C++ project with the latest commit( 9af43f11b) 
from the master branch using this guide - 
[https://arrow.apache.org/docs/developers/cpp/building.html] But the build 
fails with the following error -
{code:java}
[ 58%] Linking CXX executable ../../debug/arrow-array-test
Undefined symbols for architecture x86_64:
  "testing::Matcher > const&>::Matcher(char const*)", referenced from:
      testing::Matcher > const&> 
testing::internal::MatcherCastImpl > const&, char const*>::CastImpl(char const* 
const&, std::__1::integral_constant, 
std::__1::integral_constant) in array_test.cc.o
      testing::Matcher > const&> 
testing::internal::MatcherCastImpl > const&, char const*>::CastImpl(char const* 
const&, std::__1::integral_constant, 
std::__1::integral_constant) in array_binary_test.cc.o
ld: symbol(s) not found for architecture x86_64
clang-14: error: linker command failed with exit code 1 (use -v to see 
invocation)
make[2]: *** [src/arrow/CMakeFiles/arrow-array-test.dir/build.make:207: 
debug/arrow-array-test] Error 1
make[1]: *** [CMakeFiles/Makefile2:1653: 
src/arrow/CMakeFiles/arrow-array-test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs
[ 58%] Building CXX object 
src/arrow/CMakeFiles/arrow-table-test.dir/table_test.cc.o
[ 58%] Building CXX object src/parquet/CMakeFiles/parquet_objlib.dir/types.cc.o
[ 58%] Building CXX object 
src/arrow/CMakeFiles/arrow-table-test.dir/table_builder_test.cc.o
[ 58%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/level_comparison_avx2.cc.o
[ 58%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/level_conversion_bmi2.cc.o
[ 58%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/encryption_internal.cc.o
[ 59%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/crypto_factory.cc.o
[ 60%] Linking CXX executable ../../debug/arrow-table-test
[ 60%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/file_key_unwrapper.cc.o
[ 60%] Built target arrow-table-test
[ 60%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/file_key_wrapper.cc.o
[ 60%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/kms_client.cc.o
[ 60%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/key_material.cc.o
[ 61%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/key_metadata.cc.o
[ 61%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/key_toolkit.cc.o
[ 61%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/key_toolkit_internal.cc.o
[ 61%] Building CXX object 
src/parquet/CMakeFiles/parquet_objlib.dir/encryption/local_wrap_kms_client.cc.o
[ 61%] Built target parquet_objlib
make: *** [Makefile:146: all] Error 2 {code}
 

I am compiling this on macOS Monterey Version 12.0.1. and versions of GCC, 
python and clang are as follows -
{code:java}
$ clang --version
clang version 14.0.4
Target: x86_64-apple-darwin21.1.0
Thread model: posix
InstalledDir: /Users/anirudhacharya/miniconda3/envs/pyarrow-dev/bin

$ python --version
Python 3.9.13

$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr 
--with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.5 (clang-1205.0.22.9)
Target: x86_64-apple-darwin21.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin {code}
 

I see that there were nightly job failures for macOS that were reported in the 
mailing list - 
[https://lists.apache.org/thread/rrdwxw1st4vdcf3nh5nqfo16n3ymj90x] I am not 
sure if this failure is related to the issue I am reporting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17880) Add support for Decimal types in go/arrow/csv

2022-09-28 Thread Mitchell Devenport (Jira)
Mitchell Devenport created ARROW-17880:
--

 Summary: Add support for Decimal types in go/arrow/csv
 Key: ARROW-17880
 URL: https://issues.apache.org/jira/browse/ARROW-17880
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Mitchell Devenport


The Go CSV library lacks support for Decimal types which are supported by the 
C++ CSV library:
[arrow/writer.cc at master · apache/arrow 
(github.com)|https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/writer.cc#L378]
[arrow/type_traits.h at master · apache/arrow 
(github.com)|https://github.com/apache/arrow/blob/master/cpp/src/arrow/type_traits.h#L642]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17879) [R] Intermittent memory leaks in the valgrind nightly test

2022-09-28 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-17879:


 Summary: [R] Intermittent memory leaks in the valgrind nightly test
 Key: ARROW-17879
 URL: https://issues.apache.org/jira/browse/ARROW-17879
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Dewey Dunnington
 Fix For: 10.0.0


The memory leaks that were fixed by a workaround before the last release 
(ARROW-17252) are present again. I had hoped that the improvements to the 
captured R thread infrastructure in ARROW-11841 and ARROW-17178 would fix this; 
however, they don't (and it's not even clear that the failures are related to 
that, since as part of diagnosing those failures the last time I disabled the 
safe call infrastructure completely and was still able to observe failures).

These failures need to be debugged before the release!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17878) [Website] Exclude Ballista docs from being deleted

2022-09-28 Thread Andy Grove (Jira)
Andy Grove created ARROW-17878:
--

 Summary: [Website] Exclude Ballista docs from being deleted
 Key: ARROW-17878
 URL: https://issues.apache.org/jira/browse/ARROW-17878
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Andy Grove


Exclude Ballista docs from being deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17877) [CI][Python] verify-rc python nightly builds fail due to missing arrow/csv/api.h

2022-09-28 Thread Jira
Raúl Cumplido created ARROW-17877:
-

 Summary: [CI][Python] verify-rc python nightly builds fail due to 
missing arrow/csv/api.h
 Key: ARROW-17877
 URL: https://issues.apache.org/jira/browse/ARROW-17877
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Raúl Cumplido
Assignee: Raúl Cumplido


Some of our nightly builds are failing with:
{code:java}
 [ 35%] Building CXX object CMakeFiles/_dataset.dir/_dataset.cpp.o
/arrow/python/build/temp.linux-x86_64-cpython-38/_dataset.cpp:833:10: fatal 
error: arrow/csv/api.h: No such file or directory
 #include "arrow/csv/api.h"
          ^
compilation terminated.{code}
I suspect the flags included CSV=ON when building with PYTHON=ON changes here 
might be related: 
[https://github.com/apache/arrow/commit/53ac2a00aa9ff199773513f6f996f73a07b37989]

Example of nightly failures:

https://github.com/ursacomputing/crossbow/actions/runs/3135833175/jobs/5091988801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17876) [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt binaries

2022-09-28 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17876:
--

 Summary: [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt 
binaries
 Key: ARROW-17876
 URL: https://issues.apache.org/jira/browse/ARROW-17876
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


The new dts compiled centos-7 binaries ([ARROW-17594]) should be able to 
replace the ubuntu-18.04 binaries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17875) [C++] Remove assorted pre-C++17 compatibility measures

2022-09-28 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17875:
--

 Summary: [C++] Remove assorted pre-C++17 compatibility measures
 Key: ARROW-17875
 URL: https://issues.apache.org/jira/browse/ARROW-17875
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Some assorted pre-C++17 compatibility measures remain in the code base.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17874) [Archery] C++ linting with --clang-format or archery lint --clang-tidy fails on M1

2022-09-28 Thread Alenka Frim (Jira)
Alenka Frim created ARROW-17874:
---

 Summary: [Archery] C++ linting with --clang-format  or archery 
lint --clang-tidy fails on M1
 Key: ARROW-17874
 URL: https://issues.apache.org/jira/browse/ARROW-17874
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery
Reporter: Alenka Frim


It seems there is some cmake target issue for {{clang-format}}  and 
{{clang-tidy}} options when running {{archery lint}} on M1:

{code:java}
...
-- Build files have been written to: 
/private/var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1mgn/T/arrow-lint-g7drna9_/cpp-buildninja:
 error: unknown target 'check-format' {code}

[https://gist.github.com/AlenkaF/f60e24549529cd096bc9c975bcb71179]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17873) Writing Arrow Files using C#.

2022-09-28 Thread N Gautam Animesh (Jira)
N Gautam Animesh created ARROW-17873:


 Summary: Writing Arrow Files using C#.
 Key: ARROW-17873
 URL: https://issues.apache.org/jira/browse/ARROW-17873
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: N Gautam Animesh


Was working with Arrow along with C# and wanted to know a way to write to an 
arrow file using C#.

Do let me know if there's anything regarding this. Was not able to find 
anything on the internet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17872:
--

 Summary: [CI] Cache dependencies on macOS builds
 Key: ARROW-17872
 URL: https://issues.apache.org/jira/browse/ARROW-17872
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Continuous Integration, GLib, Python
Reporter: Antoine Pitrou


Our macOS CI builds on Github Actions usually take at least 10 minutes 
installing dependencies from Homebrew (because of compiling from source?). It 
would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)