[jira] [Created] (ARROW-10860) [Java] Avoid integer overflow for Json file reader

2020-12-08 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-10860:


 Summary: [Java] Avoid integer overflow for Json file reader
 Key: ARROW-10860
 URL: https://issues.apache.org/jira/browse/ARROW-10860
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 3.0.0
Reporter: Kazuaki Ishizaki


This issue is similar to https://issues.apache.org/jira/browse/ARROW-10662.

For the current implementation in the templates, {{int * int}} multiplication 
is used to calculate buffer offset. The result may be larger than 
Integer.MAX_VALUE, which will lead to integer overflow and unexpected behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10859) [Rust] [DataFusion] Make collect not require ExecutionContext

2020-12-08 Thread Jira
Jorge Leitão created ARROW-10859:


 Summary: [Rust] [DataFusion] Make collect not require 
ExecutionContext
 Key: ARROW-10859
 URL: https://issues.apache.org/jira/browse/ARROW-10859
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10858) [C++][MSVC] Add missing Boost dependency

2020-12-08 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-10858:


 Summary: [C++][MSVC] Add missing Boost dependency
 Key: ARROW-10858
 URL: https://issues.apache.org/jira/browse/ARROW-10858
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10857) [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-10857:


 Summary: [Packaging] Follow PowerTools repository name change on 
CentOS 8
 Key: ARROW-10857
 URL: https://issues.apache.org/jira/browse/ARROW-10857
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10856) Can't get the required C++ run time library installed correctly

2020-12-08 Thread Yi Hsiao (Jira)
Yi Hsiao created ARROW-10856:


 Summary: Can't get the required C++ run time library installed 
correctly
 Key: ARROW-10856
 URL: https://issues.apache.org/jira/browse/ARROW-10856
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Yi Hsiao


When I tried to use the example command like this in my R session:
{code:java}
df <- read_parquet(system.file("v0.7.1.parquet", package="arrow")){code}
 

It shows error:
{code:java}
> df <- read_parquet(system.file("v0.7.1.parquet", package="arrow"))
Error in io___MemoryMappedFile__Open(path, mode) :
 Cannot call io___MemoryMappedFile__Open(). Please use arrow::install_arrow() 
to install required runtime libraries.{code}
I did try to install it with `arrow::install_arrow()` and it finishes 
successfully.

However, I still get the same error message mentioned above after that.

My session info is here:

 
{code:java}
> sessioninfo::session_info()
─ Session info ───
 setting value
 version R version 4.0.2 (2020-06-22)
 os CentOS Linux 7 (Core)
 system x86_64, linux-gnu
 ui X11
 language (EN)
 collate en_US.UTF-8
 ctype en_US.UTF-8
 tz America/Detroit
 date 2020-12-08
─ Packages ───
 package * version date lib source
 arrow * 2.0.0 2020-10-20 [1] CRAN (R 4.0.2)
 assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
 bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
 bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
 cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.2)
 crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
 fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
 glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
 magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
 purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
 R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
 rlang 0.4.9 2020-11-26 [1] CRAN (R 4.0.2)
 sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
 tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
 vctrs 0.3.5 2020-11-17 [1] CRAN (R 4.0.2)
 withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
[1] /home/yihsiao/R/x86_64-pc-linux-gnu-library/4.0
[2] /sw/arcts/centos7/stacks/gcc/8.2.0/R/4.0.2/lib64/R/library
{code}
 

One thing I notice is that when installing the run time library, it doesn't get 
the correct compiler I have for C++ (8.2.0 rather than some version < 4.9)

 
{code:java}
> arrow::install_arrow()
Installing package into '/home/yihsiao/R/x86_64-pc-linux-gnu-library/4.0'
(as 'lib' is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/arrow_2.0.0.tar.gz'
Content type 'application/x-gzip' length 322592 bytes (315 KB)
==
downloaded 315 KB
* installing *source* package 'arrow' ...
** package 'arrow' successfully unpacked and MD5 sums checked
** using staged installation
*** No C++ binaries found for centos-7
*** Successfully retrieved C++ source
*** Building C++ libraries
 cmake
 S3 support not available for gcc < 4.9; building with ARROW_S3=OFF
 arrow
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10855) [Python][Numpy] ArrowTypeError after upgrading NumPy to 1.20.0rc1

2020-12-08 Thread Zhenghui Jin (Jira)
Zhenghui Jin created ARROW-10855:


 Summary: [Python][Numpy] ArrowTypeError after upgrading NumPy to 
1.20.0rc1
 Key: ARROW-10855
 URL: https://issues.apache.org/jira/browse/ARROW-10855
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 2.0.0
 Environment: macOS Big Sur 11.0.1
Reporter: Zhenghui Jin


After upgrading numpy to 1.20.0rc1 version, pandas .to_parquet() will raise 
ArrowTypeError. 

NumPy 1.19.4, Python 3.7.9, macos: 

 
{code:java}
Python 3.7.9 (default, Nov 20 2020, 23:58:42) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> np.__version__
'1.19.4'
>>> pd.DataFrame({'i': [1, 2, 3, np.nan]}, 
>>> dtype='Int64').to_parquet('nullint.parquet')
>>> 

{code}


NumPy 1.20.0rc1, Python 3.7.9, macos: 


{code:java}
Python 3.7.9 (default, Nov 20 2020, 23:58:42) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> np.__version__
'1.19.4'
>>> pd.DataFrame({'i': [1, 2, 3, np.nan]}, 
>>> dtype='Int64').to_parquet('nullint.parquet')
>>> 

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10854) [Rust] [DataFusion] Simplified logical scans

2020-12-08 Thread Jira
Jorge Leitão created ARROW-10854:


 Summary: [Rust] [DataFusion] Simplified logical scans
 Key: ARROW-10854
 URL: https://issues.apache.org/jira/browse/ARROW-10854
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers

2020-12-08 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10853:


 Summary: [Java] Undeprecate sqlToArrow helpers
 Key: ARROW-10853
 URL: https://issues.apache.org/jira/browse/ARROW-10853
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 2.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 3.0.0


These helper functions are really useful when called from Python as they deal 
with a lot of "internals" of Java that we don't want to handle from the Python 
side. We rather would keep using these functions.

Note that some of them are broken due to recent refactoring and only return 
1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10852) [C++] AssertTablesEqual(verbose=true) segfaults if the left array is longer

2020-12-08 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-10852:


 Summary: [C++] AssertTablesEqual(verbose=true) segfaults if the 
left array is longer
 Key: ARROW-10852
 URL: https://issues.apache.org/jira/browse/ARROW-10852
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 2.0.0
Reporter: Ben Kietzman
 Fix For: 3.0.0


{{MultipleChunkIterator}} is used to implement the verbose comparison in 
AssertTablesEqual and seems to assume that the arrays have identical length. If 
the left chunkedarray is longer, this will result in segfaulting when trying to 
read nonexistent chunks of the right chunkedarray.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10851) [C++] Reduce code size of vector_sort.cc

2020-12-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-10851:
--

 Summary: [C++] Reduce code size of vector_sort.cc
 Key: ARROW-10851
 URL: https://issues.apache.org/jira/browse/ARROW-10851
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10850) Unrecognized compression type: LZ4 on Windows

2020-12-08 Thread Chris Kennedy (Jira)
Chris Kennedy created ARROW-10850:
-

 Summary: Unrecognized compression type: LZ4 on Windows
 Key: ARROW-10850
 URL: https://issues.apache.org/jira/browse/ARROW-10850
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 2.0.0
 Environment: Windows 10, R 3.6.2, RStudio 1.3.1073
Reporter: Chris Kennedy


Hello,

I have recently re-installed Arrow from CRAN in R 3.6.2 and it no longer can 
import a feather file with LZ4 compression (whereas in previous months this 
worked fine):
{code:java}
> data = suppressWarnings(arrow::read_feather("blah.feather"))
{code}
{noformat}
Error in ipc___feather___Reader__Read(self, columns) : Invalid: Unrecognized 
compression type: LZ4{noformat}
I have attempted to install from source but continue to receive this error. 
According to the documentation though shouldn't the CRAN package also have LZ4 
support? Is it possible that the CRAN build has lost LZ4 support? My feather 
file was created in pandas.

Happy to send over any other information that could be helpful, and apologies 
if I am making some mistake on my end.

Thanks,

Chris



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10849) [Python] Handle numpy deprecation warnings for builtin type aliases

2020-12-08 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-10849:
-

 Summary: [Python] Handle numpy deprecation warnings for builtin 
type aliases
 Key: ARROW-10849
 URL: https://issues.apache.org/jira/browse/ARROW-10849
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


See 
https://numpy.org/devdocs/release/1.20.0-notes.html#using-the-aliases-of-builtin-types-like-np-int-is-deprecated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10848) [C++] CSV ISO-8601 date and timestamp short form

2020-12-08 Thread Maciej (Jira)
Maciej created ARROW-10848:
--

 Summary: [C++] CSV ISO-8601 date and timestamp short form
 Key: ARROW-10848
 URL: https://issues.apache.org/jira/browse/ARROW-10848
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Maciej


Arrow supports {color:#008000}ISO-8601 {color:#172b4d}for data and timestamp 
parsing but doesn't support short form of them. E.g.{color}{color}
{code:java}
19990108
or
19990108 040506
{code}
Examples taken from: https://www.postgresql.org/docs/12/datatype-datetime.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10847) [C++] CSV date custom parser

2020-12-08 Thread Maciej (Jira)
Maciej created ARROW-10847:
--

 Summary: [C++] CSV date custom parser
 Key: ARROW-10847
 URL: https://issues.apache.org/jira/browse/ARROW-10847
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 2.0.0
Reporter: Maciej


When I have a custom date format in CSV I'd like to parse it by adding 
additional DateParser, equivalent to TimestampParser which may be added to 
{color:#001080}timestamp_parsers{color} in {color:#267f99}ConvertOptions.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10846) [C++] Add async filesystem operations

2020-12-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-10846:
--

 Summary: [C++] Add async filesystem operations
 Key: ARROW-10846
 URL: https://issues.apache.org/jira/browse/ARROW-10846
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


It would probably be useful to have Future-returning variants of some 
filesystem operations (at least {{GetFileInfo}} and {{OpenInput(File|Stream)}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10845) [Python][CI] Add python CI build using numpy nightly

2020-12-08 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-10845:
-

 Summary: [Python][CI] Add python CI build using numpy nightly
 Key: ARROW-10845
 URL: https://issues.apache.org/jira/browse/ARROW-10845
 Project: Apache Arrow
  Issue Type: Improvement
  Components: CI, Python
Reporter: Joris Van den Bossche
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)