[jira] [Created] (ARROW-9935) New filesystem API unable to read empty S3 folders

2020-09-07 Thread Weston Pace (Jira)
Weston Pace created ARROW-9935:
--

 Summary: New filesystem API unable to read empty S3 folders
 Key: ARROW-9935
 URL: https://issues.apache.org/jira/browse/ARROW-9935
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Weston Pace
 Attachments: arrow_453.py, arrow_9935.py

When an empty "folder" is created in S3 using the online bucket explorer tool 
on the management console then it creates a special empty file with the same 
name as the folder.

(Some more details here: 
[https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html)]

If parquet files are later loaded into one of these directories (with or 
without partitioning subdirectories) then this dataset cannot be read by the 
new dataset API.  The underlying s3fs `find` method returns a "file" object 
with size 0 that pyarrow then attempts to read.  Since this file doesn't truly 
exist a FileNotFoundError is thrown.

Would it be safe to simply ignore all files with size 0?

As a workaround I can wrap s3fs' find method and strip out these objects with 
size 0 myself.

I've attached a script showing the issue and a workaround.  It uses a public 
bucket that I'll leave up for a few months.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9934) [Rust] Shape and stride check in tensor

2020-09-07 Thread Fernando Herrera (Jira)
Fernando Herrera created ARROW-9934:
---

 Summary: [Rust] Shape and stride check in tensor
 Key: ARROW-9934
 URL: https://issues.apache.org/jira/browse/ARROW-9934
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Fernando Herrera


When creating a tensor there is no check for the supplied shape and stride. 
There should be a check before creating the tensor object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9933) [Developer] Add drone as a CI provider for crossbow

2020-09-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9933:
---

 Summary: [Developer] Add drone as a CI provider for crossbow
 Key: ARROW-9933
 URL: https://issues.apache.org/jira/browse/ARROW-9933
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9932) R package fails to install on Ubuntu 14

2020-09-07 Thread Ofek Shilon (Jira)
Ofek Shilon created ARROW-9932:
--

 Summary: R package fails to install on Ubuntu 14
 Key: ARROW-9932
 URL: https://issues.apache.org/jira/browse/ARROW-9932
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 1.0.1
 Environment: R version 3.4.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Reporter: Ofek Shilon


1. From R (3.4) prompt, we run

{{> install.packages("arrow")}}

and it seems to succeed.

2. Next we run:

{{> arrow::install_arrow()}}

This is the full output:

{{Installing package into '/opt/R-3.4.0.mkl/library'}}
{{(as 'lib' is unspecified)}}
{{trying URL 'https://cloud.r-project.org/src/contrib/arrow_1.0.1.tar.gz'}}
{{Content type 'application/x-gzip' length 274865 bytes (268 KB)}}
{{==}}
{{downloaded 268 KBinstalling *source* package 'arrow' ...}}
{{** package 'arrow' successfully unpacked and MD5 sums checked}}
{{*** No C++ binaries found for ubuntu-14.04}}
{{*** Successfully retrieved C++ source}}
{{*** Building C++ libraries}}
{{ cmake}}
{{Error in dQuote(env_var_list, FALSE) : unused argument (FALSE)}}
{{Calls: build_libarrow -> paste}}
{{Execution halted}}
{{- NOTE ---}}
{{After installation, please run arrow::install_arrow()}}
{{for help installing required runtime libraries}}
{{-}}
{{** libs}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array.cpp -o array.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_from_vector.cpp -o array_from_vector.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_to_vector.cpp -o array_to_vector.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arraydata.cpp -o arraydata.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arrowExports.cpp -o arrowExports.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c buffer.cpp -o buffer.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c chunkedarray.cpp -o chunkedarray.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compression.cpp -o compression.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compute.cpp -o compute.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c csv.cpp -o csv.o}}{{g++ -std=gnu++0x 
-I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c dataset.cpp -o dataset.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c datatype.cpp -o datatype.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c expression.cpp -o expression.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c feather.cpp -o feather.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c field.cpp -o field.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c filesystem.cpp -o filesystem.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c imports.cpp -o imports.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c io.cpp -o io.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 

[GitHub] [arrow-testing] pitrou merged pull request #46: ARROW-9931: Add IPC fuzz file

2020-09-07 Thread GitBox


pitrou merged pull request #46:
URL: https://github.com/apache/arrow-testing/pull/46


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-testing] pitrou opened a new pull request #46: ARROW-9931: Add IPC fuzz file

2020-09-07 Thread GitBox


pitrou opened a new pull request #46:
URL: https://github.com/apache/arrow-testing/pull/46


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (ARROW-9931) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9931:
-

 Summary: [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
 Key: ARROW-9931
 URL: https://issues.apache.org/jira/browse/ARROW-9931
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9930) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9930:
-

 Summary: [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
 Key: ARROW-9930
 URL: https://issues.apache.org/jira/browse/ARROW-9930
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9929:
---

 Summary: [Developer] Autotune cmake-format
 Key: ARROW-9929
 URL: https://issues.apache.org/jira/browse/ARROW-9929
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9928:
-

 Summary: [C++] Speed up integer parsing slightly
 Key: ARROW-9928
 URL: https://issues.apache.org/jira/browse/ARROW-9928
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


By exiting early out of the parsing routine when the input is exhausted, we can 
save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9927) Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Pal (Jira)
Pal created ARROW-9927:
--

 Summary: Add dplyr group_by, summarise and mutate support in 
function open_dataset R arrow package  
 Key: ARROW-9927
 URL: https://issues.apache.org/jira/browse/ARROW-9927
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Pal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)