[jira] [Created] (ARROW-10860) [Java] Avoid integer overflow for Json file reader
Kazuaki Ishizaki created ARROW-10860: Summary: [Java] Avoid integer overflow for Json file reader Key: ARROW-10860 URL: https://issues.apache.org/jira/browse/ARROW-10860 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 3.0.0 Reporter: Kazuaki Ishizaki This issue is similar to https://issues.apache.org/jira/browse/ARROW-10662. For the current implementation in the templates, {{int * int}} multiplication is used to calculate buffer offset. The result may be larger than Integer.MAX_VALUE, which will lead to integer overflow and unexpected behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10859) [Rust] [DataFusion] Make collect not require ExecutionContext
Jorge Leitão created ARROW-10859: Summary: [Rust] [DataFusion] Make collect not require ExecutionContext Key: ARROW-10859 URL: https://issues.apache.org/jira/browse/ARROW-10859 Project: Apache Arrow Issue Type: Bug Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10858) [C++][MSVC] Add missing Boost dependency
Kouhei Sutou created ARROW-10858: Summary: [C++][MSVC] Add missing Boost dependency Key: ARROW-10858 URL: https://issues.apache.org/jira/browse/ARROW-10858 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10857) [Packaging] Follow PowerTools repository name change on CentOS 8
Kouhei Sutou created ARROW-10857: Summary: [Packaging] Follow PowerTools repository name change on CentOS 8 Key: ARROW-10857 URL: https://issues.apache.org/jira/browse/ARROW-10857 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10856) Can't get the required C++ run time library installed correctly
Yi Hsiao created ARROW-10856: Summary: Can't get the required C++ run time library installed correctly Key: ARROW-10856 URL: https://issues.apache.org/jira/browse/ARROW-10856 Project: Apache Arrow Issue Type: Bug Reporter: Yi Hsiao When I tried to use the example command like this in my R session: {code:java} df <- read_parquet(system.file("v0.7.1.parquet", package="arrow")){code} It shows error: {code:java} > df <- read_parquet(system.file("v0.7.1.parquet", package="arrow")) Error in io___MemoryMappedFile__Open(path, mode) : Cannot call io___MemoryMappedFile__Open(). Please use arrow::install_arrow() to install required runtime libraries.{code} I did try to install it with `arrow::install_arrow()` and it finishes successfully. However, I still get the same error message mentioned above after that. My session info is here: {code:java} > sessioninfo::session_info() ─ Session info ─── setting value version R version 4.0.2 (2020-06-22) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Detroit date 2020-12-08 ─ Packages ─── package * version date lib source arrow * 2.0.0 2020-10-20 [1] CRAN (R 4.0.2) assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2) cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.2) crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2) fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2) glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) rlang 0.4.9 2020-11-26 [1] CRAN (R 4.0.2) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) vctrs 0.3.5 2020-11-17 [1] CRAN (R 4.0.2) withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2) [1] /home/yihsiao/R/x86_64-pc-linux-gnu-library/4.0 [2] /sw/arcts/centos7/stacks/gcc/8.2.0/R/4.0.2/lib64/R/library {code} One thing I notice is that when installing the run time library, it doesn't get the correct compiler I have for C++ (8.2.0 rather than some version < 4.9) {code:java} > arrow::install_arrow() Installing package into '/home/yihsiao/R/x86_64-pc-linux-gnu-library/4.0' (as 'lib' is unspecified) trying URL 'https://cloud.r-project.org/src/contrib/arrow_2.0.0.tar.gz' Content type 'application/x-gzip' length 322592 bytes (315 KB) == downloaded 315 KB * installing *source* package 'arrow' ... ** package 'arrow' successfully unpacked and MD5 sums checked ** using staged installation *** No C++ binaries found for centos-7 *** Successfully retrieved C++ source *** Building C++ libraries cmake S3 support not available for gcc < 4.9; building with ARROW_S3=OFF arrow {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10855) [Python][Numpy] ArrowTypeError after upgrading NumPy to 1.20.0rc1
Zhenghui Jin created ARROW-10855: Summary: [Python][Numpy] ArrowTypeError after upgrading NumPy to 1.20.0rc1 Key: ARROW-10855 URL: https://issues.apache.org/jira/browse/ARROW-10855 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 2.0.0 Environment: macOS Big Sur 11.0.1 Reporter: Zhenghui Jin After upgrading numpy to 1.20.0rc1 version, pandas .to_parquet() will raise ArrowTypeError. NumPy 1.19.4, Python 3.7.9, macos: {code:java} Python 3.7.9 (default, Nov 20 2020, 23:58:42) [Clang 12.0.0 (clang-1200.0.32.27)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> import pandas as pd >>> np.__version__ '1.19.4' >>> pd.DataFrame({'i': [1, 2, 3, np.nan]}, >>> dtype='Int64').to_parquet('nullint.parquet') >>> {code} NumPy 1.20.0rc1, Python 3.7.9, macos: {code:java} Python 3.7.9 (default, Nov 20 2020, 23:58:42) [Clang 12.0.0 (clang-1200.0.32.27)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> import pandas as pd >>> np.__version__ '1.19.4' >>> pd.DataFrame({'i': [1, 2, 3, np.nan]}, >>> dtype='Int64').to_parquet('nullint.parquet') >>> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10854) [Rust] [DataFusion] Simplified logical scans
Jorge Leitão created ARROW-10854: Summary: [Rust] [DataFusion] Simplified logical scans Key: ARROW-10854 URL: https://issues.apache.org/jira/browse/ARROW-10854 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers
Uwe Korn created ARROW-10853: Summary: [Java] Undeprecate sqlToArrow helpers Key: ARROW-10853 URL: https://issues.apache.org/jira/browse/ARROW-10853 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 2.0.0 Reporter: Uwe Korn Assignee: Uwe Korn Fix For: 3.0.0 These helper functions are really useful when called from Python as they deal with a lot of "internals" of Java that we don't want to handle from the Python side. We rather would keep using these functions. Note that some of them are broken due to recent refactoring and only return 1024 rows (the default iterator size) without the ability to change that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10852) [C++] AssertTablesEqual(verbose=true) segfaults if the left array is longer
Ben Kietzman created ARROW-10852: Summary: [C++] AssertTablesEqual(verbose=true) segfaults if the left array is longer Key: ARROW-10852 URL: https://issues.apache.org/jira/browse/ARROW-10852 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 2.0.0 Reporter: Ben Kietzman Fix For: 3.0.0 {{MultipleChunkIterator}} is used to implement the verbose comparison in AssertTablesEqual and seems to assume that the arrays have identical length. If the left chunkedarray is longer, this will result in segfaulting when trying to read nonexistent chunks of the right chunkedarray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10851) [C++] Reduce code size of vector_sort.cc
Antoine Pitrou created ARROW-10851: -- Summary: [C++] Reduce code size of vector_sort.cc Key: ARROW-10851 URL: https://issues.apache.org/jira/browse/ARROW-10851 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10850) Unrecognized compression type: LZ4 on Windows
Chris Kennedy created ARROW-10850: - Summary: Unrecognized compression type: LZ4 on Windows Key: ARROW-10850 URL: https://issues.apache.org/jira/browse/ARROW-10850 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 2.0.0 Environment: Windows 10, R 3.6.2, RStudio 1.3.1073 Reporter: Chris Kennedy Hello, I have recently re-installed Arrow from CRAN in R 3.6.2 and it no longer can import a feather file with LZ4 compression (whereas in previous months this worked fine): {code:java} > data = suppressWarnings(arrow::read_feather("blah.feather")) {code} {noformat} Error in ipc___feather___Reader__Read(self, columns) : Invalid: Unrecognized compression type: LZ4{noformat} I have attempted to install from source but continue to receive this error. According to the documentation though shouldn't the CRAN package also have LZ4 support? Is it possible that the CRAN build has lost LZ4 support? My feather file was created in pandas. Happy to send over any other information that could be helpful, and apologies if I am making some mistake on my end. Thanks, Chris -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10849) [Python] Handle numpy deprecation warnings for builtin type aliases
Joris Van den Bossche created ARROW-10849: - Summary: [Python] Handle numpy deprecation warnings for builtin type aliases Key: ARROW-10849 URL: https://issues.apache.org/jira/browse/ARROW-10849 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche See https://numpy.org/devdocs/release/1.20.0-notes.html#using-the-aliases-of-builtin-types-like-np-int-is-deprecated -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10848) [C++] CSV ISO-8601 date and timestamp short form
Maciej created ARROW-10848: -- Summary: [C++] CSV ISO-8601 date and timestamp short form Key: ARROW-10848 URL: https://issues.apache.org/jira/browse/ARROW-10848 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Maciej Arrow supports {color:#008000}ISO-8601 {color:#172b4d}for data and timestamp parsing but doesn't support short form of them. E.g.{color}{color} {code:java} 19990108 or 19990108 040506 {code} Examples taken from: https://www.postgresql.org/docs/12/datatype-datetime.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10847) [C++] CSV date custom parser
Maciej created ARROW-10847: -- Summary: [C++] CSV date custom parser Key: ARROW-10847 URL: https://issues.apache.org/jira/browse/ARROW-10847 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 2.0.0 Reporter: Maciej When I have a custom date format in CSV I'd like to parse it by adding additional DateParser, equivalent to TimestampParser which may be added to {color:#001080}timestamp_parsers{color} in {color:#267f99}ConvertOptions.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10846) [C++] Add async filesystem operations
Antoine Pitrou created ARROW-10846: -- Summary: [C++] Add async filesystem operations Key: ARROW-10846 URL: https://issues.apache.org/jira/browse/ARROW-10846 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Antoine Pitrou It would probably be useful to have Future-returning variants of some filesystem operations (at least {{GetFileInfo}} and {{OpenInput(File|Stream)}}). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10845) [Python][CI] Add python CI build using numpy nightly
Joris Van den Bossche created ARROW-10845: - Summary: [Python][CI] Add python CI build using numpy nightly Key: ARROW-10845 URL: https://issues.apache.org/jira/browse/ARROW-10845 Project: Apache Arrow Issue Type: Improvement Components: CI, Python Reporter: Joris Van den Bossche Fix For: 3.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)