[jira] [Commented] (ARROW-9104) [C++] Parquet encryption tests should write files to a temporary directory instead of the testing submodule's directory

2020-09-07 Thread Gidon Gershinsky (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191965#comment-17191965
 ] 

Gidon Gershinsky commented on ARROW-9104:
-

[~apitrou], this will be handled by [~revit13], could you assign the Jira to 
her?

(looks like she's not on the contributor list, even though she's contributed a 
significant part of the initial encryption code, including the testing 
submodule component. this is probably due to this being committed as a part of 
a joint pull request, sent by another person)

> [C++] Parquet encryption tests should write files to a temporary directory 
> instead of the testing submodule's directory
> ---
>
> Key: ARROW-9104
> URL: https://issues.apache.org/jira/browse/ARROW-9104
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 2.0.0
>
>
> If the source directory is not writable the test raises permission denied 
> error:
> [ RUN  ] TestEncryptionConfiguration.UniformEncryption
> 1632
> unknown file: Failure
> 1633
> C++ exception with description "IOError: Failed to open local file 
> '/arrow/cpp/submodules/parquet-testing/data/tmp_uniform_encryption.parquet.encrypted'.
>  Detail: [errno 13] Permission denied" thrown in the test body.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9752) [Rust] [DataFusion] Add support for Aggregate UDFs

2020-09-07 Thread Jorge (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge updated ARROW-9752:
-
Labels:   (was: pull-request-available)

> [Rust] [DataFusion] Add support for Aggregate UDFs
> --
>
> Key: ARROW-9752
> URL: https://issues.apache.org/jira/browse/ARROW-9752
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This will allow to more easily extend the existing offering of aggregate 
> functions.
> The existing functions shall be migrated to this interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9826) [Rust] add set function to PrimitiveArray

2020-09-07 Thread Jorge (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191943#comment-17191943
 ] 

Jorge commented on ARROW-9826:
--

> Generally speaking if arrays are immutable, there are some operations that 
> should directly modify elements at specific index.

if arrays as immutable, there should be _no_ operations that modify its 
elements, right?

I also understood that arrays are immutable. Some general reasons is that this 
allows to pass slices of data by reference, both within a thread and across 
threads, without the need to worry about data races or threads waiting around. 
It is a whole computational model.

In rust, it allow us to use {{Arc}} to safely share arrays across threads, as 
otherwise we would need a Mutex or other mechanism.

To "modify" elements, we create a new array with the modified elements (see 
e.g. {{src/compute}}).

 

 

> [Rust] add set function to PrimitiveArray
> -
>
> Key: ARROW-9826
> URL: https://issues.apache.org/jira/browse/ARROW-9826
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Francesco Gadaleta
>Priority: Major
>
> For in-place value replacement in Array, a `set()` function (maybe unsafe?) 
> would be required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9846) [Rust] Master branch broken build

2020-09-07 Thread Jorge (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191941#comment-17191941
 ] 

Jorge commented on ARROW-9846:
--

[~andygrove], should we mark this as can't reproduce, as master is fixed for 
some time now? I do not know how I would approach this.

> [Rust] Master branch broken build
> -
>
> Key: ARROW-9846
> URL: https://issues.apache.org/jira/browse/ARROW-9846
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 2.0.0
>
>
> Master branch is failing to build in CI. It fails to compile 
> "tower-balance-0.3.0". I cannot reproduce locally.
> {code:java}
> error[E0502]: cannot borrow `self` as immutable because it is also borrowed 
> as mutable
>--> 
> /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-balance-0.3.0/src/pool/mod.rs:381:21
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9935) New filesystem API unable to read empty S3 folders

2020-09-07 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-9935:
---
Attachment: (was: arrow_453.py)

> New filesystem API unable to read empty S3 folders
> --
>
> Key: ARROW-9935
> URL: https://issues.apache.org/jira/browse/ARROW-9935
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Weston Pace
>Priority: Minor
> Attachments: arrow_9935.py
>
>
> When an empty "folder" is created in S3 using the online bucket explorer tool 
> on the management console then it creates a special empty file with the same 
> name as the folder.
> (Some more details here: 
> [https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html)]
> If parquet files are later loaded into one of these directories (with or 
> without partitioning subdirectories) then this dataset cannot be read by the 
> new dataset API.  The underlying s3fs `find` method returns a "file" object 
> with size 0 that pyarrow then attempts to read.  Since this file doesn't 
> truly exist a FileNotFoundError is thrown.
> Would it be safe to simply ignore all files with size 0?
> As a workaround I can wrap s3fs' find method and strip out these objects with 
> size 0 myself.
> I've attached a script showing the issue and a workaround.  It uses a public 
> bucket that I'll leave up for a few months.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9935) New filesystem API unable to read empty S3 folders

2020-09-07 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-9935:
---
Attachment: arrow_9935.py

> New filesystem API unable to read empty S3 folders
> --
>
> Key: ARROW-9935
> URL: https://issues.apache.org/jira/browse/ARROW-9935
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Weston Pace
>Priority: Minor
> Attachments: arrow_453.py, arrow_9935.py
>
>
> When an empty "folder" is created in S3 using the online bucket explorer tool 
> on the management console then it creates a special empty file with the same 
> name as the folder.
> (Some more details here: 
> [https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html)]
> If parquet files are later loaded into one of these directories (with or 
> without partitioning subdirectories) then this dataset cannot be read by the 
> new dataset API.  The underlying s3fs `find` method returns a "file" object 
> with size 0 that pyarrow then attempts to read.  Since this file doesn't 
> truly exist a FileNotFoundError is thrown.
> Would it be safe to simply ignore all files with size 0?
> As a workaround I can wrap s3fs' find method and strip out these objects with 
> size 0 myself.
> I've attached a script showing the issue and a workaround.  It uses a public 
> bucket that I'll leave up for a few months.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9935) New filesystem API unable to read empty S3 folders

2020-09-07 Thread Weston Pace (Jira)
Weston Pace created ARROW-9935:
--

 Summary: New filesystem API unable to read empty S3 folders
 Key: ARROW-9935
 URL: https://issues.apache.org/jira/browse/ARROW-9935
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Weston Pace
 Attachments: arrow_453.py, arrow_9935.py

When an empty "folder" is created in S3 using the online bucket explorer tool 
on the management console then it creates a special empty file with the same 
name as the folder.

(Some more details here: 
[https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html)]

If parquet files are later loaded into one of these directories (with or 
without partitioning subdirectories) then this dataset cannot be read by the 
new dataset API.  The underlying s3fs `find` method returns a "file" object 
with size 0 that pyarrow then attempts to read.  Since this file doesn't truly 
exist a FileNotFoundError is thrown.

Would it be safe to simply ignore all files with size 0?

As a workaround I can wrap s3fs' find method and strip out these objects with 
size 0 myself.

I've attached a script showing the issue and a workaround.  It uses a public 
bucket that I'll leave up for a few months.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1700) [JS] Implement Node.js client for Plasma store

2020-09-07 Thread Mrudang Majmudar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191889#comment-17191889
 ] 

Mrudang Majmudar commented on ARROW-1700:
-

This is very interesting very cool. 

NodeJS will be very help full .

 

Also in our custom browser ( Electron based  ) we have managed to open mmap 
file 

web app opens mmap file and returns sharedarraybuffer.  we have been using 
succesfully to read data off mmap file and display in browser. using custom 
data 

 

however using plasma will be very useful to write object from JS and other 
process can read it  

> [JS] Implement Node.js client for Plasma store
> --
>
> Key: ARROW-1700
> URL: https://issues.apache.org/jira/browse/ARROW-1700
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Plasma, JavaScript
>Reporter: Robert Nishihara
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7288) [C++][R] read_parquet() freezes on Windows with Japanese locale

2020-09-07 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191866#comment-17191866
 ] 

Neal Richardson commented on ARROW-7288:


Previously (I forget the particular JIRA) we were not able to get a C++ library 
build with debug symbols on the (old) Rtools toolchain. So when I've run with 
gdb, all it says (in the case of segfault, which this is not) is "yep, it 
crashed". 

To be clear, I believe the issue is in the parquet C++ library as compiled with 
the Rtools 35 toolchain. I don't recall if I've tried with Rtools 40. And FWIW 
there was a JIRA about running the C++ test suite on the library as built by 
Rtools, and we didn't succeed in getting a build. 

> [C++][R] read_parquet() freezes on Windows with Japanese locale
> ---
>
> Key: ARROW-7288
> URL: https://issues.apache.org/jira/browse/ARROW-7288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 0.15.1
> Environment: R 3.6.1 on Windows 10
>Reporter: Hiroaki Yutani
>Priority: Critical
>  Labels: parquet
> Fix For: 2.0.0
>
>
> The following example on read_parquet()'s doc freezes (seems to wait for the 
> result forever) on my Windows.
> df <- read_parquet(system.file("v0.7.1.parquet", package="arrow"))
> The CRAN checks are all fine, which means the example is successfully 
> executed on the CRAN Windows. So, I have no idea why it doesn't work on my 
> local.
> [https://cran.r-project.org/web/checks/check_results_arrow.html]
> Here's my session info in case it helps:
> {code:java}
> > sessioninfo::session_info()
> - Session info 
> -
>  setting  value
>  version  R version 3.6.1 (2019-07-05)
>  os   Windows 10 x64
>  system   x86_64, mingw32
>  ui   RStudio
>  language en
>  collate  Japanese_Japan.932
>  ctypeJapanese_Japan.932
>  tz   Asia/Tokyo
>  date 2019-12-01
> - Packages 
> -
>  package * version  date   lib source
>  arrow   * 0.15.1.1 2019-11-05 [1] CRAN (R 3.6.1)
>  assertthat0.2.12019-03-21 [1] CRAN (R 3.6.0)
>  bit   1.1-14   2018-05-29 [1] CRAN (R 3.6.0)
>  bit64 0.9-72017-05-08 [1] CRAN (R 3.6.0)
>  cli   1.1.02019-03-19 [1] CRAN (R 3.6.0)
>  crayon1.3.42017-09-16 [1] CRAN (R 3.6.0)
>  fs1.3.12019-05-06 [1] CRAN (R 3.6.0)
>  glue  1.3.12019-03-12 [1] CRAN (R 3.6.0)
>  magrittr  1.5  2014-11-22 [1] CRAN (R 3.6.0)
>  purrr 0.3.32019-10-18 [1] CRAN (R 3.6.1)
>  R62.4.12019-11-12 [1] CRAN (R 3.6.1)
>  Rcpp  1.0.32019-11-08 [1] CRAN (R 3.6.1)
>  reprex0.3.02019-05-16 [1] CRAN (R 3.6.0)
>  rlang 0.4.22019-11-23 [1] CRAN (R 3.6.1)
>  rstudioapi0.10 2019-03-19 [1] CRAN (R 3.6.0)
>  sessioninfo   1.1.12018-11-05 [1] CRAN (R 3.6.0)
>  tidyselect0.2.52018-10-11 [1] CRAN (R 3.6.0)
>  withr 2.1.22018-03-15 [1] CRAN (R 3.6.0)
> [1] C:/Users/hiroaki-yutani/Documents/R/win-library/3.6
> [2] C:/Program Files/R/R-3.6.1/library
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9926) [GLib] Use placement new for GArrowRecordBatchFileReader

2020-09-07 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191865#comment-17191865
 ] 

Kouhei Sutou commented on ARROW-9926:
-

Issue resolved by pull request 8120
https://github.com/apache/arrow/pull/8120

> [GLib] Use placement new for GArrowRecordBatchFileReader
> 
>
> Key: ARROW-9926
> URL: https://issues.apache.org/jira/browse/ARROW-9926
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9926) [GLib] Use placement new for GArrowRecordBatchFileReader

2020-09-07 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-9926.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> [GLib] Use placement new for GArrowRecordBatchFileReader
> 
>
> Key: ARROW-9926
> URL: https://issues.apache.org/jira/browse/ARROW-9926
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9925) [GLib] Add low level value readers for GArrowListArray family

2020-09-07 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-9925.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8119
[https://github.com/apache/arrow/pull/8119]

> [GLib] Add low level value readers for GArrowListArray family
> -
>
> Key: ARROW-9925
> URL: https://issues.apache.org/jira/browse/ARROW-9925
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9934) [Rust] Shape and stride check in tensor

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9934:
--
Labels: pull-request-available  (was: )

> [Rust] Shape and stride check in tensor
> ---
>
> Key: ARROW-9934
> URL: https://issues.apache.org/jira/browse/ARROW-9934
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Fernando Herrera
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When creating a tensor there is no check for the supplied shape and stride. 
> There should be a check before creating the tensor object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9934) [Rust] Shape and stride check in tensor

2020-09-07 Thread Fernando Herrera (Jira)
Fernando Herrera created ARROW-9934:
---

 Summary: [Rust] Shape and stride check in tensor
 Key: ARROW-9934
 URL: https://issues.apache.org/jira/browse/ARROW-9934
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Fernando Herrera


When creating a tensor there is no check for the supplied shape and stride. 
There should be a check before creating the tensor object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9908) [Rust] Support temporal data types in JSON reader

2020-09-07 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9908.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8124
[https://github.com/apache/arrow/pull/8124]

> [Rust] Support temporal data types in JSON reader
> -
>
> Key: ARROW-9908
> URL: https://issues.apache.org/jira/browse/ARROW-9908
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Christoph Schulze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the JSON reader does not support any temporal data types. Columns 
> with *numerical* data should be interpretable as temporal type when defined 
> accordingly in the schema. Currently this would throw an error with a 
> misleading message ("struct types are not yet supported").
> related issue:
> https://issues.apache.org/jira/browse/ARROW-4803 focuses on parsing temporal 
> data based on strings inputs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7288) [C++][R] read_parquet() freezes on Windows with Japanese locale

2020-09-07 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191851#comment-17191851
 ] 

Neal Richardson commented on ARROW-7288:


I don't have VS, no. Since R uses mingw though, would it be helpful anyway?

> [C++][R] read_parquet() freezes on Windows with Japanese locale
> ---
>
> Key: ARROW-7288
> URL: https://issues.apache.org/jira/browse/ARROW-7288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 0.15.1
> Environment: R 3.6.1 on Windows 10
>Reporter: Hiroaki Yutani
>Priority: Critical
>  Labels: parquet
> Fix For: 2.0.0
>
>
> The following example on read_parquet()'s doc freezes (seems to wait for the 
> result forever) on my Windows.
> df <- read_parquet(system.file("v0.7.1.parquet", package="arrow"))
> The CRAN checks are all fine, which means the example is successfully 
> executed on the CRAN Windows. So, I have no idea why it doesn't work on my 
> local.
> [https://cran.r-project.org/web/checks/check_results_arrow.html]
> Here's my session info in case it helps:
> {code:java}
> > sessioninfo::session_info()
> - Session info 
> -
>  setting  value
>  version  R version 3.6.1 (2019-07-05)
>  os   Windows 10 x64
>  system   x86_64, mingw32
>  ui   RStudio
>  language en
>  collate  Japanese_Japan.932
>  ctypeJapanese_Japan.932
>  tz   Asia/Tokyo
>  date 2019-12-01
> - Packages 
> -
>  package * version  date   lib source
>  arrow   * 0.15.1.1 2019-11-05 [1] CRAN (R 3.6.1)
>  assertthat0.2.12019-03-21 [1] CRAN (R 3.6.0)
>  bit   1.1-14   2018-05-29 [1] CRAN (R 3.6.0)
>  bit64 0.9-72017-05-08 [1] CRAN (R 3.6.0)
>  cli   1.1.02019-03-19 [1] CRAN (R 3.6.0)
>  crayon1.3.42017-09-16 [1] CRAN (R 3.6.0)
>  fs1.3.12019-05-06 [1] CRAN (R 3.6.0)
>  glue  1.3.12019-03-12 [1] CRAN (R 3.6.0)
>  magrittr  1.5  2014-11-22 [1] CRAN (R 3.6.0)
>  purrr 0.3.32019-10-18 [1] CRAN (R 3.6.1)
>  R62.4.12019-11-12 [1] CRAN (R 3.6.1)
>  Rcpp  1.0.32019-11-08 [1] CRAN (R 3.6.1)
>  reprex0.3.02019-05-16 [1] CRAN (R 3.6.0)
>  rlang 0.4.22019-11-23 [1] CRAN (R 3.6.1)
>  rstudioapi0.10 2019-03-19 [1] CRAN (R 3.6.0)
>  sessioninfo   1.1.12018-11-05 [1] CRAN (R 3.6.0)
>  tidyselect0.2.52018-10-11 [1] CRAN (R 3.6.0)
>  withr 2.1.22018-03-15 [1] CRAN (R 3.6.0)
> [1] C:/Users/hiroaki-yutani/Documents/R/win-library/3.6
> [2] C:/Program Files/R/R-3.6.1/library
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-4861) [C++] Introduce MemoryManager::Memset method.

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-4861.
-
Fix Version/s: (was: 2.0.0)
 Assignee: (was: Pearu Peterson)
   Resolution: Later

> [C++] Introduce MemoryManager::Memset method.
> -
>
> Key: ARROW-4861
> URL: https://issues.apache.org/jira/browse/ARROW-4861
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Pearu Peterson
>Priority: Major
>  Labels: C++
>
> Exposing such a method could be useful to initialize a buffer without going 
> through a CPU-to-device memory copy.
> For example CUDA expose several memset-like APIs to initialize GPU memory 
> with different initializer widths:
> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4861) [C++] Introduce MemoryManager::Memset method.

2020-09-07 Thread Pearu Peterson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191846#comment-17191846
 ] 

Pearu Peterson commented on ARROW-4861:
---

[~apitrou] Atm I don't have a need for this.

> [C++] Introduce MemoryManager::Memset method.
> -
>
> Key: ARROW-4861
> URL: https://issues.apache.org/jira/browse/ARROW-4861
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Pearu Peterson
>Assignee: Pearu Peterson
>Priority: Major
>  Labels: C++
> Fix For: 2.0.0
>
>
> Exposing such a method could be useful to initialize a buffer without going 
> through a CPU-to-device memory copy.
> For example CUDA expose several memset-like APIs to initialize GPU memory 
> with different initializer widths:
> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9718) [Python] Make pyarrow.parquet work with the new filesystem interfaces

2020-09-07 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche resolved ARROW-9718.
--
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 7991
[https://github.com/apache/arrow/pull/7991]

> [Python] Make pyarrow.parquet work with the new filesystem interfaces
> -
>
> Key: ARROW-9718
> URL: https://issues.apache.org/jira/browse/ARROW-9718
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: filesystem, pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The place internally where the "legacy" `pyarrow.filesystem` filesystems are 
> still used is in the {{pyarrow.parquet}} module.
> It is used in:
> - ParquetWriter
> - ParquetManifest/ParquetDataset
> - write_to_dataset
> For {{ParquetWriter}}, we need to update this to work with the new 
> filesystems (since ParquetWriter is not dataset related, and thus won't be 
> deprecated).  
> For {{ParquetManifest}}/{{ParquetDataset}}, it might not need to be updated, 
> since those might get deprecated itself (to be discussed -> ARROW-9720), and 
> when using the {{use_legacy_dataset=False}} option, it already uses the new 
> datasets.  
> For {{write_to_dataset}}, this might depend on how the writing capabilities 
> of the dataset project evolve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9718) [Python] Make pyarrow.parquet work with the new filesystem interfaces

2020-09-07 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche reassigned ARROW-9718:


Assignee: Joris Van den Bossche

> [Python] Make pyarrow.parquet work with the new filesystem interfaces
> -
>
> Key: ARROW-9718
> URL: https://issues.apache.org/jira/browse/ARROW-9718
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: filesystem, pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The place internally where the "legacy" `pyarrow.filesystem` filesystems are 
> still used is in the {{pyarrow.parquet}} module.
> It is used in:
> - ParquetWriter
> - ParquetManifest/ParquetDataset
> - write_to_dataset
> For {{ParquetWriter}}, we need to update this to work with the new 
> filesystems (since ParquetWriter is not dataset related, and thus won't be 
> deprecated).  
> For {{ParquetManifest}}/{{ParquetDataset}}, it might not need to be updated, 
> since those might get deprecated itself (to be discussed -> ARROW-9720), and 
> when using the {{use_legacy_dataset=False}} option, it already uses the new 
> datasets.  
> For {{write_to_dataset}}, this might depend on how the writing capabilities 
> of the dataset project evolve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9078) [C++] Parquet writing of extension type with nested storage type fails

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-9078:
-

Assignee: Antoine Pitrou

> [C++] Parquet writing of extension type with nested storage type fails
> --
>
> Key: ARROW-9078
> URL: https://issues.apache.org/jira/browse/ARROW-9078
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: parquet
> Fix For: 2.0.0
>
>
> A reproducer in Python:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> class MyStructType(pa.PyExtensionType): 
>  
> def __init__(self): 
> pa.PyExtensionType.__init__(self, pa.struct([('left', pa.int64()), 
> ('right', pa.int64())])) 
>  
> def __reduce__(self): 
> return MyStructType, () 
> struct_array = pa.StructArray.from_arrays(
> [
> pa.array([0, 1], type="int64", from_pandas=True),
> pa.array([1, 2], type="int64", from_pandas=True),
> ],
> names=["left", "right"],
> )
> # works
> table = pa.table({'a': struct_array})
> pq.write_table(table, "test_struct.parquet")
> # doesn't work
> mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array)
> table = pa.table({'a': mystruct_array})
> pq.write_table(table, "test_struct.parquet")
> {code}
> Writing the simple StructArray nowadays works (and reading it back in as 
> well). 
> But when the struct array is the storage array of an ExtensionType, it fails 
> with the following error:
> {code}
> ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9078) [C++] Parquet writing of extension type with nested storage type fails

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191829#comment-17191829
 ] 

Antoine Pitrou commented on ARROW-9078:
---

Fixing the writing side is relatively easy, but the reading side looks like a 
can of worms.

> [C++] Parquet writing of extension type with nested storage type fails
> --
>
> Key: ARROW-9078
> URL: https://issues.apache.org/jira/browse/ARROW-9078
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: parquet
> Fix For: 2.0.0
>
>
> A reproducer in Python:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> class MyStructType(pa.PyExtensionType): 
>  
> def __init__(self): 
> pa.PyExtensionType.__init__(self, pa.struct([('left', pa.int64()), 
> ('right', pa.int64())])) 
>  
> def __reduce__(self): 
> return MyStructType, () 
> struct_array = pa.StructArray.from_arrays(
> [
> pa.array([0, 1], type="int64", from_pandas=True),
> pa.array([1, 2], type="int64", from_pandas=True),
> ],
> names=["left", "right"],
> )
> # works
> table = pa.table({'a': struct_array})
> pq.write_table(table, "test_struct.parquet")
> # doesn't work
> mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array)
> table = pa.table({'a': mystruct_array})
> pq.write_table(table, "test_struct.parquet")
> {code}
> Writing the simple StructArray nowadays works (and reading it back in as 
> well). 
> But when the struct array is the storage array of an ExtensionType, it fails 
> with the following error:
> {code}
> ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9104) [C++] Parquet encryption tests should write files to a temporary directory instead of the testing submodule's directory

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191817#comment-17191817
 ] 

Antoine Pitrou commented on ARROW-9104:
---

@gershinsky Could you take a look at this? (and make sure further tests don't 
repeat the same mistake)

> [C++] Parquet encryption tests should write files to a temporary directory 
> instead of the testing submodule's directory
> ---
>
> Key: ARROW-9104
> URL: https://issues.apache.org/jira/browse/ARROW-9104
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 2.0.0
>
>
> If the source directory is not writable the test raises permission denied 
> error:
> [ RUN  ] TestEncryptionConfiguration.UniformEncryption
> 1632
> unknown file: Failure
> 1633
> C++ exception with description "IOError: Failed to open local file 
> '/arrow/cpp/submodules/parquet-testing/data/tmp_uniform_encryption.parquet.encrypted'.
>  Detail: [errno 13] Permission denied" thrown in the test body.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9104) [C++] Parquet encryption tests should write files to a temporary directory instead of the testing submodule's directory

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-9104:
-

Assignee: Gidon Gershinsky

> [C++] Parquet encryption tests should write files to a temporary directory 
> instead of the testing submodule's directory
> ---
>
> Key: ARROW-9104
> URL: https://issues.apache.org/jira/browse/ARROW-9104
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 2.0.0
>
>
> If the source directory is not writable the test raises permission denied 
> error:
> [ RUN  ] TestEncryptionConfiguration.UniformEncryption
> 1632
> unknown file: Failure
> 1633
> C++ exception with description "IOError: Failed to open local file 
> '/arrow/cpp/submodules/parquet-testing/data/tmp_uniform_encryption.parquet.encrypted'.
>  Detail: [errno 13] Permission denied" thrown in the test body.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4861) [C++] Introduce MemoryManager::Memset method.

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191811#comment-17191811
 ] 

Antoine Pitrou commented on ARROW-4861:
---

[~pearu] Do you still need this / want to work on this?

> [C++] Introduce MemoryManager::Memset method.
> -
>
> Key: ARROW-4861
> URL: https://issues.apache.org/jira/browse/ARROW-4861
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Pearu Peterson
>Assignee: Pearu Peterson
>Priority: Major
>  Labels: C++
> Fix For: 2.0.0
>
>
> Exposing such a method could be useful to initialize a buffer without going 
> through a CPU-to-device memory copy.
> For example CUDA expose several memset-like APIs to initialize GPU memory 
> with different initializer widths:
> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9128) [C++] Implement string space trimming kernels: trim, ltrim, and rtrim

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191809#comment-17191809
 ] 

Antoine Pitrou commented on ARROW-9128:
---

SQL and Postgres have trim /  ltrim / rtrim it seems, I suppose that explains 
the names proposed here.

> [C++] Implement string space trimming kernels: trim, ltrim, and rtrim
> -
>
> Key: ARROW-9128
> URL: https://issues.apache.org/jira/browse/ARROW-9128
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2290) [C++/Python] Add ability to set codec options for lz4 codec

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191796#comment-17191796
 ] 

Antoine Pitrou commented on ARROW-2290:
---

The compression level is exposed, are we looking to expose other settings?

> [C++/Python] Add ability to set codec options for lz4 codec
> ---
>
> Key: ARROW-2290
> URL: https://issues.apache.org/jira/browse/ARROW-2290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> The LZ4 library has many parameters, currently we do not expose these in C++ 
> or Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7288) [C++][R] read_parquet() freezes on Windows with Japanese locale

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191794#comment-17191794
 ] 

Antoine Pitrou commented on ARROW-7288:
---

[~npr] Do you have Visual Studio installed? Can you obtain a stack trace?

See 
[https://docs.microsoft.com/en-us/visualstudio/debugger/debug-using-the-just-in-time-debugger?view=vs-2019]

> [C++][R] read_parquet() freezes on Windows with Japanese locale
> ---
>
> Key: ARROW-7288
> URL: https://issues.apache.org/jira/browse/ARROW-7288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 0.15.1
> Environment: R 3.6.1 on Windows 10
>Reporter: Hiroaki Yutani
>Priority: Critical
>  Labels: parquet
> Fix For: 2.0.0
>
>
> The following example on read_parquet()'s doc freezes (seems to wait for the 
> result forever) on my Windows.
> df <- read_parquet(system.file("v0.7.1.parquet", package="arrow"))
> The CRAN checks are all fine, which means the example is successfully 
> executed on the CRAN Windows. So, I have no idea why it doesn't work on my 
> local.
> [https://cran.r-project.org/web/checks/check_results_arrow.html]
> Here's my session info in case it helps:
> {code:java}
> > sessioninfo::session_info()
> - Session info 
> -
>  setting  value
>  version  R version 3.6.1 (2019-07-05)
>  os   Windows 10 x64
>  system   x86_64, mingw32
>  ui   RStudio
>  language en
>  collate  Japanese_Japan.932
>  ctypeJapanese_Japan.932
>  tz   Asia/Tokyo
>  date 2019-12-01
> - Packages 
> -
>  package * version  date   lib source
>  arrow   * 0.15.1.1 2019-11-05 [1] CRAN (R 3.6.1)
>  assertthat0.2.12019-03-21 [1] CRAN (R 3.6.0)
>  bit   1.1-14   2018-05-29 [1] CRAN (R 3.6.0)
>  bit64 0.9-72017-05-08 [1] CRAN (R 3.6.0)
>  cli   1.1.02019-03-19 [1] CRAN (R 3.6.0)
>  crayon1.3.42017-09-16 [1] CRAN (R 3.6.0)
>  fs1.3.12019-05-06 [1] CRAN (R 3.6.0)
>  glue  1.3.12019-03-12 [1] CRAN (R 3.6.0)
>  magrittr  1.5  2014-11-22 [1] CRAN (R 3.6.0)
>  purrr 0.3.32019-10-18 [1] CRAN (R 3.6.1)
>  R62.4.12019-11-12 [1] CRAN (R 3.6.1)
>  Rcpp  1.0.32019-11-08 [1] CRAN (R 3.6.1)
>  reprex0.3.02019-05-16 [1] CRAN (R 3.6.0)
>  rlang 0.4.22019-11-23 [1] CRAN (R 3.6.1)
>  rstudioapi0.10 2019-03-19 [1] CRAN (R 3.6.0)
>  sessioninfo   1.1.12018-11-05 [1] CRAN (R 3.6.0)
>  tidyselect0.2.52018-10-11 [1] CRAN (R 3.6.0)
>  withr 2.1.22018-03-15 [1] CRAN (R 3.6.0)
> [1] C:/Users/hiroaki-yutani/Documents/R/win-library/3.6
> [2] C:/Program Files/R/R-3.6.1/library
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9081) [C++] Upgrade default LLVM version to 10

2020-09-07 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-9081:

Fix Version/s: (was: 2.0.0)

> [C++] Upgrade default LLVM version to 10
> 
>
> Key: ARROW-9081
> URL: https://issues.apache.org/jira/browse/ARROW-9081
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.17.1
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Critical
>
> Upgrade llvm dependencies to default to version 10.
> There are several obstacles here, as apt
> https://github.com/apache/arrow/pull/7323



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9081) [C++] Upgrade default LLVM version to 10

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191791#comment-17191791
 ] 

Antoine Pitrou commented on ARROW-9081:
---

Is this critical? Apart from Gandiva, we have no important feature relying on 
LLVM. I'd let the Gandiva developers choose when to bump the dependency.

> [C++] Upgrade default LLVM version to 10
> 
>
> Key: ARROW-9081
> URL: https://issues.apache.org/jira/browse/ARROW-9081
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.17.1
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Critical
> Fix For: 2.0.0
>
>
> Upgrade llvm dependencies to default to version 10.
> There are several obstacles here, as apt
> https://github.com/apache/arrow/pull/7323



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191789#comment-17191789
 ] 

Wes McKinney commented on ARROW-9253:
-

I removed it from any milestone for now

> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-9253:

Fix Version/s: (was: 2.0.0)

> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9924) [Python] Performance regression reading individual Parquet files using Dataset interface

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191784#comment-17191784
 ] 

Antoine Pitrou commented on ARROW-9924:
---

cc [~jorisvandenbossche]

> [Python] Performance regression reading individual Parquet files using 
> Dataset interface
> 
>
> Key: ARROW-9924
> URL: https://issues.apache.org/jira/browse/ARROW-9924
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Critical
> Fix For: 2.0.0
>
>
> I haven't investigated very deeply but this seems symptomatic of a problem:
> {code}
> In [27]: df = pd.DataFrame({'A': np.random.randn(1000)})  
>   
>   
> In [28]: pq.write_table(pa.table(df), 'test.parquet') 
>   
>   
> In [29]: timeit pq.read_table('test.parquet') 
>   
>   
> 79.8 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
> In [30]: timeit pq.read_table('test.parquet', use_legacy_dataset=True)
>   
>   
> 66.4 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6282) [Format] Support lossy compression

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-6282.
-
Resolution: Abandoned

I will close this, as this should certainly be discussed on the Arrow 
development mailing-list first.

> [Format] Support lossy compression
> --
>
> Key: ARROW-6282
> URL: https://issues.apache.org/jira/browse/ARROW-6282
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Dominik Moritz
>Priority: Major
> Fix For: 2.0.0
>
>
> Arrow dataframes with large columns of integers or floats can be compressed 
> using gzip or brotli. However, in some cases it will be okay to compress the 
> data lossy to achieve even higher compression ratios. The main use case for 
> this is visualization where small inaccuracies matter less. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-4248) [C++][Plasma] Build on Windows / Visual Studio

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-4248.
-
Resolution: Won't Fix

> [C++][Plasma] Build on Windows / Visual Studio
> --
>
> Key: ARROW-4248
> URL: https://issues.apache.org/jira/browse/ARROW-4248
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> See https://github.com/apache/arrow/issues/3391



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8431) [C++][CI] Configure a build to build and execute the C++ tests with GCC 4.8

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191781#comment-17191781
 ] 

Antoine Pitrou commented on ARROW-8431:
---

I'm not convinced this would be a very good use of our time, since gcc 4.8 is 
mostly obsolete by now (except for manylinux1, admittedly).

> [C++][CI] Configure a build to build and execute the C++ tests with GCC 4.8
> ---
>
> Key: ARROW-8431
> URL: https://issues.apache.org/jira/browse/ARROW-8431
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 2.0.0
>
>
> The gandiva jar nightly build and the manylinux1 wheels are building with GCC 
> 4.8.
> We already have the manylinux1 running on each commit but it doesn't exercise 
> the C++ tests 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-4685) [C++] Update Boost to 1.69 in manylinux1 docker image

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-4685.
-
Resolution: Duplicate

> [C++] Update Boost to 1.69 in manylinux1 docker image
> -
>
> Key: ARROW-4685
> URL: https://issues.apache.org/jira/browse/ARROW-4685
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Packaging, Python
>Reporter: Uwe Korn
>Priority: Minor
> Fix For: 2.0.0
>
>
> We currently use Boost 1.66 in the manylinux1 docker image but should use the 
> latest version there to get all features, bugfixes, … The main difficulty in 
> updating is that we want to use a namespaced Boost build and the build 
> scripts have changed since the 1.66 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1393) [C++] Simplified CUDA IPC writer and reader for communicating a CPU + GPU payload to another process

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-1393:
--
Component/s: GPU

> [C++] Simplified CUDA IPC writer and reader for communicating a CPU + GPU 
> payload to another process
> 
>
> Key: ARROW-1393
> URL: https://issues.apache.org/jira/browse/ARROW-1393
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, GPU
>Reporter: Wes McKinney
>Priority: Major
>  Labels: GPU
> Fix For: 2.0.0
>
>
> The purpose of this would be to simplify transmission of a mixed-device 
> payload from one process to another. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5382) [C++] SIMD on ARM NEON

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-5382.
-
Fix Version/s: (was: 2.0.0)
   Resolution: Later

There's no reason to keep an issue open for this "just in case".

> [C++] SIMD on ARM NEON
> --
>
> Key: ARROW-5382
> URL: https://issues.apache.org/jira/browse/ARROW-5382
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Minor
>  Labels: SIMD
>
> Arrow doesn't yet support SIMD on arm architectures. SSE on ARM can be 
> complicated, but there are a couple of libraries We could depend on, namely:
> - https://github.com/nemequ/simde (MIT)
> - 
> https://github.com/catboost/catboost/tree/ee47f9aa399833cb04bfeec5fe9f3e3792d428e4/library/sse
>  (Apache)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-7517) [C++] Builder does not honour dictionary type provided during initialization

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-7517.
-
Resolution: Duplicate

This was fixed by ARROW-9642. Thank you for the report!

> [C++] Builder does not honour dictionary type provided during initialization
> 
>
> Key: ARROW-7517
> URL: https://issues.apache.org/jira/browse/ARROW-7517
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Wamsi Viswanath
>Priority: Major
> Fix For: 2.0.0
>
>
> Below is an example for reproducing the issue:
> [https://gist.github.com/wamsiv/d48ec37a9a9b5f4d484de6ff86a3870d]
> Builder automatically optimizes the dictionary type depending upon the number 
> of unique values provided which results in schema mismatch.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191768#comment-17191768
 ] 

Antoine Pitrou edited comment on ARROW-9253 at 9/7/20, 4:13 PM:


Though it seems rounding issues may be tricky.
{code:python}
>>> k = 86400_000_000_000; x = 123 * k
>>> (x * math.ceil(2**64 / k)) >> 64
123
>>> k = 86400_000_000_000; x = 106751 * k
>>> (x * math.ceil(2**64 / k)) >> 64
106751

>>> k = 86400_000_000_000; x = -106751 * k
>>> (x * math.ceil(2**64 / k)) >> 64
-106752
>>> (x * math.floor(2**64 / k)) >> 64
-106751
>>> k = 86400_000_000_000; x = -3 * k
>>> (x * math.ceil(2**64 / k)) >> 64
-4
>>> (x * math.floor(2**64 / k)) >> 64
-3
{code}


was (Author: pitrou):
Though it seems precision issues may be tricky.

{code:python}
>>> k = 86400_000_000_000; x = 123 * k
>>> (x * math.ceil(2**64 / k)) >> 64
123
>>> k = 86400_000_000_000; x = 106751 * k
>>> (x * math.ceil(2**64 / k)) >> 64
106751

>>> k = 86400_000_000_000; x = -106751 * k
>>> (x * math.ceil(2**64 / k)) >> 64
-106752
>>> (x * math.floor(2**64 / k)) >> 64
-106751
>>> k = 86400_000_000_000; x = -3 * k
>>> (x * math.ceil(2**64 / k)) >> 64
-4
>>> (x * math.floor(2**64 / k)) >> 64
-3
{code}


> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191768#comment-17191768
 ] 

Antoine Pitrou edited comment on ARROW-9253 at 9/7/20, 3:58 PM:


Though it seems precision issues may be tricky.

{code:python}
>>> k = 86400_000_000_000; x = 123 * k
>>> (x * math.ceil(2**64 / k)) >> 64
123
>>> k = 86400_000_000_000; x = 106751 * k
>>> (x * math.ceil(2**64 / k)) >> 64
106751

>>> k = 86400_000_000_000; x = -106751 * k
>>> (x * math.ceil(2**64 / k)) >> 64
-106752
>>> (x * math.floor(2**64 / k)) >> 64
-106751
>>> k = 86400_000_000_000; x = -3 * k
>>> (x * math.ceil(2**64 / k)) >> 64
-4
>>> (x * math.floor(2**64 / k)) >> 64
-3
{code}



was (Author: pitrou):
Though it seems precision issues will be tricky to solve.

> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-9929.

Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8123
[https://github.com/apache/arrow/pull/8123]

> [Developer] Autotune cmake-format
> -
>
> Key: ARROW-9929
> URL: https://issues.apache.org/jira/browse/ARROW-9929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191768#comment-17191768
 ] 

Antoine Pitrou commented on ARROW-9253:
---

Though it seems precision issues will be tricky to solve.

> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191767#comment-17191767
 ] 

Antoine Pitrou commented on ARROW-9253:
---

We could use high-precision multiplication by reciprocal. Boost's int128_t 
seems quite fast:
https://www.boost.org/doc/libs/1_74_0/libs/multiprecision/doc/html/boost_multiprecision/perf/integer_performance.html

> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-9932) R package fails to install on Ubuntu 14

2020-09-07 Thread Ofek Shilon (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ofek Shilon closed ARROW-9932.
--
Resolution: Not A Problem

It currently seems it was a clash of some functions in ./r/tools/linuxlibs.R 
with an in-house implementation of 'untar'.    Sorry for the hassle.

> R package fails to install on Ubuntu 14
> ---
>
> Key: ARROW-9932
> URL: https://issues.apache.org/jira/browse/ARROW-9932
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.1
> Environment: R version 3.4.0 (2015-04-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.5 LTS
>Reporter: Ofek Shilon
>Priority: Major
>
> 1. From R (3.4) prompt, we run
> {{> install.packages("arrow")}}
> and it seems to succeed.
> 2. Next we run:
> {{> arrow::install_arrow()}}
> This is the full output:
> {{Installing package into '/opt/R-3.4.0.mkl/library'}}
>  {{(as 'lib' is unspecified)}}
>  {{trying URL 'https://cloud.r-project.org/src/contrib/arrow_1.0.1.tar.gz'}}
>  {{Content type 'application/x-gzip' length 274865 bytes (268 KB)}}
>  {{==}}
>  {{downloaded 268 KB}}
> {{installing *source* package 'arrow' ...}}
>  {{** package 'arrow' successfully unpacked and MD5 sums checked}}
>  {{*** No C++ binaries found for ubuntu-14.04}}
>  {{*** Successfully retrieved C++ source}}
>  {{*** Building C++ libraries}}
>  {{ cmake}}
>  {color:#ff}*{{Error in dQuote(env_var_list, FALSE) : unused argument 
> (FALSE)}}*{color}
>  {color:#ff} *{{Calls: build_libarrow -> paste}}*{color}
>  {color:#ff} *{{Execution halted}}*{color}
>  {{- NOTE ---}}
>  {{After installation, please run arrow::install_arrow()}}
>  {{for help installing required runtime libraries}}
>  {{-}}
>  {{** libs}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c array.cpp -o array.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c array_from_vector.cpp -o array_from_vector.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c array_to_vector.cpp -o array_to_vector.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c arraydata.cpp -o arraydata.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c arrowExports.cpp -o arrowExports.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c buffer.cpp -o buffer.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c chunkedarray.cpp -o chunkedarray.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c compression.cpp -o compression.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c compute.cpp -o compute.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c csv.cpp -o csv.o+}}{{g+ -std=gnu++0x 
> -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c dataset.cpp -o dataset.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c datatype.cpp -o datatype.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c expression.cpp -o expression.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c feather.cpp -o feather.o}}
>  {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
> -I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
> -march=x86-64 -O3 -c field.cpp -o field.o}}
>  {{g++ 

[jira] [Assigned] (ARROW-9933) [Developer] Add drone as a CI provider for crossbow

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9933:


Assignee: Apache Arrow JIRA Bot  (was: Uwe Korn)

> [Developer] Add drone as a CI provider for crossbow
> ---
>
> Key: ARROW-9933
> URL: https://issues.apache.org/jira/browse/ARROW-9933
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Apache Arrow JIRA Bot
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9933) [Developer] Add drone as a CI provider for crossbow

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9933:


Assignee: Uwe Korn  (was: Apache Arrow JIRA Bot)

> [Developer] Add drone as a CI provider for crossbow
> ---
>
> Key: ARROW-9933
> URL: https://issues.apache.org/jira/browse/ARROW-9933
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9933) [Developer] Add drone as a CI provider for crossbow

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9933:
--
Labels: pull-request-available  (was: )

> [Developer] Add drone as a CI provider for crossbow
> ---
>
> Key: ARROW-9933
> URL: https://issues.apache.org/jira/browse/ARROW-9933
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9253) [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h

2020-09-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191753#comment-17191753
 ] 

Antoine Pitrou commented on ARROW-9253:
---

Is this actually performance-critical somewhere?

> [C++] Add vectorized "IntegersMultipleOf" to arrow/util/int_util.h
> --
>
> Key: ARROW-9253
> URL: https://issues.apache.org/jira/browse/ARROW-9253
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> There are various places where we check whether an array of integers are all 
> a multiple of another number (e.g. a multiple of 8640 milliseconds per 
> day). It would be better to factor out this data check into a reusable 
> function similar to the {{CheckIntegersInRange}} function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9933) [Developer] Add drone as a CI provider for crossbow

2020-09-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9933:
---

 Summary: [Developer] Add drone as a CI provider for crossbow
 Key: ARROW-9933
 URL: https://issues.apache.org/jira/browse/ARROW-9933
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9913) [C++] Outputs of Decimal128::FromString depend on presence of one another

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9913.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8109
[https://github.com/apache/arrow/pull/8109]

> [C++] Outputs of Decimal128::FromString depend on presence of one another
> -
>
> Key: ARROW-9913
> URL: https://issues.apache.org/jira/browse/ARROW-9913
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Mingyu Zhong
>Assignee: Mingyu Zhong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/arrow/blame/bfac60dd73bffa5f7bcefc890486268036182278/cpp/src/arrow/util/decimal.cc#L365-L373]
>  in Decimal128::FromString makes *out depend on whether scale is null, and 
> makes *scale depend on whether out is null. For example, given an input "1e2",
>  # if out is not null and scale is not null, then *out is 1
>  # if out is null and scale is not null, then *scale is -2
>  # if neither out nor scale is null, then *out is 100 and *scale is 0
> It is very counter-intuitive that when an additional output-only pointer is 
> given for receiving extra info, it alters the value of another output.
> Similarly, *precision is also affected by presence of out and scale.
> The block of adjustment was added in 
> [https://github.com/apache/arrow/commit/bfac60dd73bffa5f7bcefc890486268036182278]
>  for preventing negative scale output 
> (https://issues.apache.org/jira/browse/ARROW-2177). While the motivation 
> looks reasonable, the change is not sufficient (see case #2 above).
> I think we should make the outputs independent of one-another. For the input 
> "1e2", *out should be 100 if out is not null, and *scale should be 0 if scale 
> is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9904) [C++] Unroll the loop manually for CountSetBits

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9904.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8103
[https://github.com/apache/arrow/pull/8103]

> [C++] Unroll the loop manually for CountSetBits
> ---
>
> Key: ARROW-9904
> URL: https://issues.apache.org/jira/browse/ARROW-9904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Frank Du
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The tight loop below can get better performance if unroll manually to 
> indicate the compiler generating better parallel instructions.
> for (auto iter = u64_data; iter < end; ++iter) {
>  count += BitUtil::PopCount(*iter);
>  }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8359) [C++/Python] Enable aarch64/ppc64le build in conda recipes

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8359:
--
Labels: pull-request-available  (was: )

> [C++/Python] Enable aarch64/ppc64le build in conda recipes
> --
>
> Key: ARROW-8359
> URL: https://issues.apache.org/jira/browse/ARROW-8359
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging, Python
>Reporter: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> These two new arches were added in the conda recipes, we should also build 
> them as nightlies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9932) R package fails to install on Ubuntu 14

2020-09-07 Thread Ofek Shilon (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ofek Shilon updated ARROW-9932:
---
Description: 
1. From R (3.4) prompt, we run

{{> install.packages("arrow")}}

and it seems to succeed.

2. Next we run:

{{> arrow::install_arrow()}}

This is the full output:

{{Installing package into '/opt/R-3.4.0.mkl/library'}}
 {{(as 'lib' is unspecified)}}
 {{trying URL 'https://cloud.r-project.org/src/contrib/arrow_1.0.1.tar.gz'}}
 {{Content type 'application/x-gzip' length 274865 bytes (268 KB)}}
 {{==}}
 {{downloaded 268 KB}}

{{installing *source* package 'arrow' ...}}
 {{** package 'arrow' successfully unpacked and MD5 sums checked}}
 {{*** No C++ binaries found for ubuntu-14.04}}
 {{*** Successfully retrieved C++ source}}
 {{*** Building C++ libraries}}
 {{ cmake}}
 {color:#ff}*{{Error in dQuote(env_var_list, FALSE) : unused argument 
(FALSE)}}*{color}
 {color:#ff} *{{Calls: build_libarrow -> paste}}*{color}
 {color:#ff} *{{Execution halted}}*{color}
 {{- NOTE ---}}
 {{After installation, please run arrow::install_arrow()}}
 {{for help installing required runtime libraries}}
 {{-}}
 {{** libs}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array.cpp -o array.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_from_vector.cpp -o array_from_vector.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_to_vector.cpp -o array_to_vector.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arraydata.cpp -o arraydata.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arrowExports.cpp -o arrowExports.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c buffer.cpp -o buffer.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c chunkedarray.cpp -o chunkedarray.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compression.cpp -o compression.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compute.cpp -o compute.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c csv.cpp -o csv.o+}}{{g+ -std=gnu++0x 
-I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c dataset.cpp -o dataset.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c datatype.cpp -o datatype.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c expression.cpp -o expression.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c feather.cpp -o feather.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c field.cpp -o field.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c filesystem.cpp -o filesystem.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c imports.cpp -o imports.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c io.cpp -o io.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c json.cpp -o json.o+}}{{g+ -std=gnu++0x 
-I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 

[jira] [Updated] (ARROW-9932) R package fails to install on Ubuntu 14

2020-09-07 Thread Ofek Shilon (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ofek Shilon updated ARROW-9932:
---
Description: 
1. From R (3.4) prompt, we run

{{> install.packages("arrow")}}

and it seems to succeed.

2. Next we run:

{{> arrow::install_arrow()}}

This is the full output:

{{Installing package into '/opt/R-3.4.0.mkl/library'}}
 {{(as 'lib' is unspecified)}}
 {{trying URL 'https://cloud.r-project.org/src/contrib/arrow_1.0.1.tar.gz'}}
 {{Content type 'application/x-gzip' length 274865 bytes (268 KB)}}
 {{==}}
 {{downloaded 268 KB}}

{{installing *source* package 'arrow' ...}}
 {{** package 'arrow' successfully unpacked and MD5 sums checked}}
 {{*** No C++ binaries found for ubuntu-14.04}}
 {{*** Successfully retrieved C++ source}}
 {{*** Building C++ libraries}}
 {{ cmake}}
 {color:#FF}*{{Error in dQuote(env_var_list, FALSE) : unused argument 
(FALSE)}}*{color}
{color:#FF} *{{Calls: build_libarrow -> paste}}*{color}
{color:#FF} *{{Execution halted}}*{color}
 {{- NOTE ---}}
 {{After installation, please run arrow::install_arrow()}}
 {{for help installing required runtime libraries}}
 {{-}}
 {{** libs}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array.cpp -o array.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_from_vector.cpp -o array_from_vector.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_to_vector.cpp -o array_to_vector.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arraydata.cpp -o arraydata.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arrowExports.cpp -o arrowExports.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c buffer.cpp -o buffer.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c chunkedarray.cpp -o chunkedarray.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compression.cpp -o compression.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compute.cpp -o compute.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c csv.cpp -o csv.o+}}{{g+ -std=gnu++0x 
-I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c dataset.cpp -o dataset.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c datatype.cpp -o datatype.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c expression.cpp -o expression.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c feather.cpp -o feather.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c field.cpp -o field.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c filesystem.cpp -o filesystem.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c imports.cpp -o imports.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c io.cpp -o io.o}}
 {{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c json.cpp -o json.o+}}{{g+ -std=gnu++0x 
-I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 

[jira] [Comment Edited] (ARROW-8435) [Python] A TypeError is raised while token expires during writing to S3

2020-09-07 Thread PMP Certification in Hyderabad 360DigiTMG (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191731#comment-17191731
 ] 

PMP Certification in Hyderabad 360DigiTMG edited comment on ARROW-8435 at 
9/7/20, 2:30 PM:
---

Look at this about issue, you can get a idea 

Visit: [typeerror nonetype object is not 
subscriptable|https://360digitmg.com/python-typeerror-nonetype-object-is-not-subsriptable]


was (Author: pmpcertification):
Look at this about issue, you can get a idea 

Visit: 
https://360digitmg.com/python-typeerror-nonetype-object-is-not-subsriptable

> [Python] A TypeError is raised while token expires during writing to S3
> ---
>
> Key: ARROW-8435
> URL: https://issues.apache.org/jira/browse/ARROW-8435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
>Reporter: Shawn Li
>Priority: Critical
>
> This issue occurs when a STS token expires *in the middle of* writing to S3. 
> An OSError: Write failed: TypeError("'NoneType' object is not 
> subscriptable",) is raised instead of a PermissionError.
>  
> OSError: Write failed: TypeError("'NoneType' object is not subscriptable",)
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1450, 
> in
>  write_to_dataset write_table(subtable, f, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1344, 
> in
>  write_table writer.write_table(table, row_group_size=row_group_size)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 474, 
> in
>  write_table self.writer.write_table(table, row_group_size=row_group_size)
>  File "pyarrow/_parquet.pyx", line 1375, in 
> pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 80, 
> in
>  pyarrow.lib.check_statuspyarrow.lib.ArrowIOError: Arrow error: IOError: The 
> provided token has expired.. Detail: Python exception: PermissionError
>  During handling of the above exception, another exception occurred:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/s3fs/core.py", line 1096, in 
> _upload_chunk PartNumber=part, UploadId=self.mpu['UploadId'],TypeError: 
> 'NoneType' object is not subscriptable
> environment is:
>  s3fs==0.4.0
>  boto3==1.10.27
>  botocore==1.13.27
>  pyarrow==0.15.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8435) [Python] A TypeError is raised while token expires during writing to S3

2020-09-07 Thread PMP Certification in Hyderabad 360DigiTMG (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191731#comment-17191731
 ] 

PMP Certification in Hyderabad 360DigiTMG commented on ARROW-8435:
--

Look at this about issue, you can get a idea 

Visit: 
https://360digitmg.com/python-typeerror-nonetype-object-is-not-subsriptable

> [Python] A TypeError is raised while token expires during writing to S3
> ---
>
> Key: ARROW-8435
> URL: https://issues.apache.org/jira/browse/ARROW-8435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
>Reporter: Shawn Li
>Priority: Critical
>
> This issue occurs when a STS token expires *in the middle of* writing to S3. 
> An OSError: Write failed: TypeError("'NoneType' object is not 
> subscriptable",) is raised instead of a PermissionError.
>  
> OSError: Write failed: TypeError("'NoneType' object is not subscriptable",)
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1450, 
> in
>  write_to_dataset write_table(subtable, f, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1344, 
> in
>  write_table writer.write_table(table, row_group_size=row_group_size)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 474, 
> in
>  write_table self.writer.write_table(table, row_group_size=row_group_size)
>  File "pyarrow/_parquet.pyx", line 1375, in 
> pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 80, 
> in
>  pyarrow.lib.check_statuspyarrow.lib.ArrowIOError: Arrow error: IOError: The 
> provided token has expired.. Detail: Python exception: PermissionError
>  During handling of the above exception, another exception occurred:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/s3fs/core.py", line 1096, in 
> _upload_chunk PartNumber=part, UploadId=self.mpu['UploadId'],TypeError: 
> 'NoneType' object is not subscriptable
> environment is:
>  s3fs==0.4.0
>  boto3==1.10.27
>  botocore==1.13.27
>  pyarrow==0.15.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9932) R package fails to install on Ubuntu 14

2020-09-07 Thread Ofek Shilon (Jira)
Ofek Shilon created ARROW-9932:
--

 Summary: R package fails to install on Ubuntu 14
 Key: ARROW-9932
 URL: https://issues.apache.org/jira/browse/ARROW-9932
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 1.0.1
 Environment: R version 3.4.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Reporter: Ofek Shilon


1. From R (3.4) prompt, we run

{{> install.packages("arrow")}}

and it seems to succeed.

2. Next we run:

{{> arrow::install_arrow()}}

This is the full output:

{{Installing package into '/opt/R-3.4.0.mkl/library'}}
{{(as 'lib' is unspecified)}}
{{trying URL 'https://cloud.r-project.org/src/contrib/arrow_1.0.1.tar.gz'}}
{{Content type 'application/x-gzip' length 274865 bytes (268 KB)}}
{{==}}
{{downloaded 268 KBinstalling *source* package 'arrow' ...}}
{{** package 'arrow' successfully unpacked and MD5 sums checked}}
{{*** No C++ binaries found for ubuntu-14.04}}
{{*** Successfully retrieved C++ source}}
{{*** Building C++ libraries}}
{{ cmake}}
{{Error in dQuote(env_var_list, FALSE) : unused argument (FALSE)}}
{{Calls: build_libarrow -> paste}}
{{Execution halted}}
{{- NOTE ---}}
{{After installation, please run arrow::install_arrow()}}
{{for help installing required runtime libraries}}
{{-}}
{{** libs}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array.cpp -o array.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_from_vector.cpp -o array_from_vector.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c array_to_vector.cpp -o array_to_vector.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arraydata.cpp -o arraydata.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c arrowExports.cpp -o arrowExports.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c buffer.cpp -o buffer.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c chunkedarray.cpp -o chunkedarray.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compression.cpp -o compression.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c compute.cpp -o compute.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c csv.cpp -o csv.o}}{{g++ -std=gnu++0x 
-I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c dataset.cpp -o dataset.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c datatype.cpp -o datatype.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c expression.cpp -o expression.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c feather.cpp -o feather.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c field.cpp -o field.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c filesystem.cpp -o filesystem.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c imports.cpp -o imports.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 
-I"/opt/R-3.4.0.mkl/library/Rcpp/include" -I/usr/local/include -fpic 
-march=x86-64 -O3 -c io.cpp -o io.o}}
{{g++ -std=gnu++0x -I/opt/R-3.4.0.mkl/lib64/R/include -DNDEBUG 

[jira] [Assigned] (ARROW-9931) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9931:


Assignee: Apache Arrow JIRA Bot  (was: Antoine Pitrou)

> [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
> ---
>
> Key: ARROW-9931
> URL: https://issues.apache.org/jira/browse/ARROW-9931
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Apache Arrow JIRA Bot
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9931) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9931:


Assignee: Antoine Pitrou  (was: Apache Arrow JIRA Bot)

> [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
> ---
>
> Key: ARROW-9931
> URL: https://issues.apache.org/jira/browse/ARROW-9931
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9931) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9931:
--
Labels: pull-request-available  (was: )

> [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
> ---
>
> Key: ARROW-9931
> URL: https://issues.apache.org/jira/browse/ARROW-9931
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9931) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9931:
-

 Summary: [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
 Key: ARROW-9931
 URL: https://issues.apache.org/jira/browse/ARROW-9931
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9930) [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)

2020-09-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9930:
-

 Summary: [C++] Fix undefined behaviour on invalid IPC (OSS-Fuzz)
 Key: ARROW-9930
 URL: https://issues.apache.org/jira/browse/ARROW-9930
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9901) [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9901.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8100
[https://github.com/apache/arrow/pull/8100]

> [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading
> --
>
> Key: ARROW-9901
> URL: https://issues.apache.org/jira/browse/ARROW-9901
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We should write tests where definition and repetition levels are explicitly 
> written out for a particular Parquet schema, then read as a Arrow column.
> Sketch here:
> https://gist.github.com/pitrou/282dd790cac0eb2c1b59e8c9ab1941d8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9927) [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191709#comment-17191709
 ] 

Wes McKinney commented on ARROW-9927:
-

In short, easier said than done. However, it would be good to have a tracking 
JIRA for dplyr feature coverage. We have issues covering much of the essential 
C++ query engine work but no idea on timeline when individuals will be able to 
complete the work. 

> [R] Add dplyr group_by, summarise and mutate support in function open_dataset 
> R arrow package  
> ---
>
> Key: ARROW-9927
> URL: https://issues.apache.org/jira/browse/ARROW-9927
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.1
>Reporter: Pal
>Priority: Major
>
> Hi, 
>  
> The open_dataset() function in the R arrow package already includes the 
> support for dplyr filter, select and rename functions. However, it would be a 
> huge improvement if it also could include other functions such as group_by, 
> summarise and mutate before calling collect(). Is there any idea or projet 
> going on to do so ? Would be it possible to include those features 
> (compatible also with dplyr version < 1) ?
> Many thanks for this excellent job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9387) [R] Use new C++ table select method

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9387:


Assignee: Neal Richardson  (was: Apache Arrow JIRA Bot)

> [R] Use new C++ table select method
> ---
>
> Key: ARROW-9387
> URL: https://issues.apache.org/jira/browse/ARROW-9387
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-8314 adds it so we can use it instead of the one we wrote in the R 
> package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9387) [R] Use new C++ table select method

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9387:


Assignee: Apache Arrow JIRA Bot  (was: Neal Richardson)

> [R] Use new C++ table select method
> ---
>
> Key: ARROW-9387
> URL: https://issues.apache.org/jira/browse/ARROW-9387
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Apache Arrow JIRA Bot
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-8314 adds it so we can use it instead of the one we wrote in the R 
> package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9387) [R] Use new C++ table select method

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9387:
--
Labels: pull-request-available  (was: )

> [R] Use new C++ table select method
> ---
>
> Key: ARROW-9387
> URL: https://issues.apache.org/jira/browse/ARROW-9387
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-8314 adds it so we can use it instead of the one we wrote in the R 
> package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9928.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8104
[https://github.com/apache/arrow/pull/8104]

> [C++] Speed up integer parsing slightly
> ---
>
> Key: ARROW-9928
> URL: https://issues.apache.org/jira/browse/ARROW-9928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> By exiting early out of the parsing routine when the input is exhausted, we 
> can save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9908) [Rust] Support temporal data types in JSON reader

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9908:
--
Labels: pull-request-available  (was: )

> [Rust] Support temporal data types in JSON reader
> -
>
> Key: ARROW-9908
> URL: https://issues.apache.org/jira/browse/ARROW-9908
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Christoph Schulze
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the JSON reader does not support any temporal data types. Columns 
> with *numerical* data should be interpretable as temporal type when defined 
> accordingly in the schema. Currently this would throw an error with a 
> misleading message ("struct types are not yet supported").
> related issue:
> https://issues.apache.org/jira/browse/ARROW-4803 focuses on parsing temporal 
> data based on strings inputs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9915) [Java] getObject API for temporal types is inconsistent and in some cases incorrect

2020-09-07 Thread Matt Jadczak (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191646#comment-17191646
 ] 

Matt Jadczak commented on ARROW-9915:
-

It looks like the mailing list discussion was about a specific shortcoming from 
my list, however given the extent of the inconsistencies, it seems to me like 
perhaps more discussion would be warranted? I'm not super-familiar with how the 
project tends to work - is the best place for such discussion here or on the 
mailing list?

IMO if the goal is to leave these implementations alone for back-compat and 
encourage users to write their own "hydration" function based on the raw `get` 
functions, we should go as far as deprecating `getObject` entirely. When used 
in a dynamic context (i.e. when you just have some vector returning some 
`Object`, rather than a statically typed vector returning a statically typed 
object) I think it can be extremely confusing when two vector types like 
TimeNanoVector and TimeMilliVector return different types of objects, neither 
of them what you would expect (in fact, that is exactly how we stumbled upon 
this issue internally).

> [Java] getObject API for temporal types is inconsistent and in some cases 
> incorrect
> ---
>
> Key: ARROW-9915
> URL: https://issues.apache.org/jira/browse/ARROW-9915
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 0.13.0, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.17.0, 
> 0.17.1, 1.0.0
>Reporter: Matt Jadczak
>Priority: Major
>
> It seems that the work which has been tracked in ARROW-2015 and merged in 
> [https://github.com/apache/arrow/pull/2966] to change the return types of the 
> various Time and Date vector types when using the getObject API missed some 
> of the vector types which are temporal and so should return a temporal type, 
> and provided an incorrect implementation for others (some of this was pointed 
> out in the initial PR review, but it seems that it slipped through the cracks 
> and was not addressed before merging).
> Here is a table of the various temporal vector types, what they currently 
> return from getObject, and what they should return, in my opinion (I have 
> included ones in which the implementation is correct for completeness, and 
> coloured them green).
>  
>  
> ||Vector class||Current return type||Proposed return type||Comments||
> |DateDayVector|Integer|LocalDate|Currently returns the raw value of days 
> since epoch, should return the actual date|
> |DateMilliVector|LocalDateTime|LocalDate|This type is supposed to encode a 
> date, not a datetime, so even though epoch millis are used, the result should 
> be a LocalDate|
> |{color:#00875a}DurationVector{color}|{color:#00875a}Duration{color}|{color:#00875a}Duration{color}|{color:#00875a}Correct.{color}|
> |IntervalDayVector|Duration|Period|As per 
> [https://github.com/apache/arrow/blob/master/format/Schema.fbs#L251] , 
> Interval should be a calendar-based datatype, not a time-based one. This is 
> represented in Java by a Period type. However, I note that the Java Period 
> class does not support milliseconds, unlike the Arrow type, which might be 
> why Duration is being returned. Some discussion may be needed on the best way 
> to deal with this.|
> |{color:#00875a}IntervalYearVector{color}|{color:#00875a}Period{color}|{color:#00875a}Period{color}|{color:#00875a}Correct.{color}|
> |TimeMicroVector|Long|LocalTime|Currently returns the raw number of micros, 
> should return the actual time|
> |TimeMilliVector|LocalDateTime|LocalTime|Currently returns a datetime on 
> 1970-01-01 with the correct time component, should just return the time|
> |TimeNanoVector|Long|LocalTime|Currently returns the raw number of nanos, 
> should return the actual time|
> |TimeSecVector|Integer|LocalTime|Currently returns the raw number of seconds, 
> should return the actual time|
> |{color:#00875a}TimeStampMicroVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampMilliVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampNanoVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampSecVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |TimeStampMicroTZVector|Long|ZonedDateTime|Currently returns the underlying 
> micros, and TZ has to be obtained separately. Should return the actual 
> datetime with timezone|
> |TimeStampMilliTZVector|Long|ZonedDateTime|Currently returns the underlying 
> millis, and TZ has to 

[jira] [Assigned] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9929:


Assignee: Apache Arrow JIRA Bot  (was: Uwe Korn)

> [Developer] Autotune cmake-format
> -
>
> Key: ARROW-9929
> URL: https://issues.apache.org/jira/browse/ARROW-9929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Apache Arrow JIRA Bot
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9929:


Assignee: Uwe Korn  (was: Apache Arrow JIRA Bot)

> [Developer] Autotune cmake-format
> -
>
> Key: ARROW-9929
> URL: https://issues.apache.org/jira/browse/ARROW-9929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9557) [R] Iterating over parquet columns is slow in R

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9557:


Assignee: Apache Arrow JIRA Bot  (was: Romain Francois)

> [R] Iterating over parquet columns is slow in R
> ---
>
> Key: ARROW-9557
> URL: https://issues.apache.org/jira/browse/ARROW-9557
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 1.0.0
>Reporter: Karl Dunkle Werner
>Assignee: Apache Arrow JIRA Bot
>Priority: Minor
>  Labels: performance, pull-request-available
> Attachments: profile_screenshot.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've found that reading in a parquet file one column at a time is slow in R – 
> much slower than reading the whole all at once in R, or reading one column at 
> a time in Python.
> An example is below, though it's certainly possible I've done my benchmarking 
> incorrectly.
>  
> Python setup and benchmarking:
> {code:python}
> import numpy as np
> import pyarrow
> import pyarrow.parquet as pq
> from numpy.random import default_rng
> from time import time
> # Create a large, random array to save. ~1.5 GB.
> rng = default_rng(seed = 1)
> n_col = 4000
> n_row = 5
> mat = rng.standard_normal((n_col, n_row))
> col_names = [str(nm) for nm in range(n_col)]
> tab = pyarrow.Table.from_arrays(mat, names=col_names)
> pq.write_table(tab, "test_tab.parquet", use_dictionary=False)
> # How long does it take to read the whole thing in python?
> time_start = time()
> _ = pq.read_table("test_tab.parquet") # edit: corrected filename
> elapsed = time() - time_start
> print(elapsed) # under 1 second on my computer
> time_start = time()
> f = pq.ParquetFile("test_tab.parquet")
> for one_col in col_names:
> _ = f.read(one_col).column(0)
> elapsed = time() - time_start
> print(elapsed) # about 2 seconds
> {code}
> R benchmarking, using the same {{test_tab.parquet}} file
> {code:r}
> library(arrow)
> read_by_column <- function(f) {
> table = ParquetFileReader$create(f)
> cols <- as.character(0:3999)
> purrr::walk(cols, ~table$ReadTable(.)$column(0))
> }
> bench::mark(
> read_parquet("test_tab.parquet", as_data_frame=FALSE), #   0.6 s
> read_parquet("test_tab.parquet", as_data_frame=TRUE),  #   1 s
> read_by_column("test_tab.parquet"),# 100 s
> check=FALSE
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9929:
--
Labels: pull-request-available  (was: )

> [Developer] Autotune cmake-format
> -
>
> Key: ARROW-9929
> URL: https://issues.apache.org/jira/browse/ARROW-9929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9557) [R] Iterating over parquet columns is slow in R

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9557:


Assignee: Romain Francois  (was: Apache Arrow JIRA Bot)

> [R] Iterating over parquet columns is slow in R
> ---
>
> Key: ARROW-9557
> URL: https://issues.apache.org/jira/browse/ARROW-9557
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 1.0.0
>Reporter: Karl Dunkle Werner
>Assignee: Romain Francois
>Priority: Minor
>  Labels: performance, pull-request-available
> Attachments: profile_screenshot.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've found that reading in a parquet file one column at a time is slow in R – 
> much slower than reading the whole all at once in R, or reading one column at 
> a time in Python.
> An example is below, though it's certainly possible I've done my benchmarking 
> incorrectly.
>  
> Python setup and benchmarking:
> {code:python}
> import numpy as np
> import pyarrow
> import pyarrow.parquet as pq
> from numpy.random import default_rng
> from time import time
> # Create a large, random array to save. ~1.5 GB.
> rng = default_rng(seed = 1)
> n_col = 4000
> n_row = 5
> mat = rng.standard_normal((n_col, n_row))
> col_names = [str(nm) for nm in range(n_col)]
> tab = pyarrow.Table.from_arrays(mat, names=col_names)
> pq.write_table(tab, "test_tab.parquet", use_dictionary=False)
> # How long does it take to read the whole thing in python?
> time_start = time()
> _ = pq.read_table("test_tab.parquet") # edit: corrected filename
> elapsed = time() - time_start
> print(elapsed) # under 1 second on my computer
> time_start = time()
> f = pq.ParquetFile("test_tab.parquet")
> for one_col in col_names:
> _ = f.read(one_col).column(0)
> elapsed = time() - time_start
> print(elapsed) # about 2 seconds
> {code}
> R benchmarking, using the same {{test_tab.parquet}} file
> {code:r}
> library(arrow)
> read_by_column <- function(f) {
> table = ParquetFileReader$create(f)
> cols <- as.character(0:3999)
> purrr::walk(cols, ~table$ReadTable(.)$column(0))
> }
> bench::mark(
> read_parquet("test_tab.parquet", as_data_frame=FALSE), #   0.6 s
> read_parquet("test_tab.parquet", as_data_frame=TRUE),  #   1 s
> read_by_column("test_tab.parquet"),# 100 s
> check=FALSE
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9557) [R] Iterating over parquet columns is slow in R

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9557:
--
Labels: performance pull-request-available  (was: performance)

> [R] Iterating over parquet columns is slow in R
> ---
>
> Key: ARROW-9557
> URL: https://issues.apache.org/jira/browse/ARROW-9557
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 1.0.0
>Reporter: Karl Dunkle Werner
>Assignee: Romain Francois
>Priority: Minor
>  Labels: performance, pull-request-available
> Attachments: profile_screenshot.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've found that reading in a parquet file one column at a time is slow in R – 
> much slower than reading the whole all at once in R, or reading one column at 
> a time in Python.
> An example is below, though it's certainly possible I've done my benchmarking 
> incorrectly.
>  
> Python setup and benchmarking:
> {code:python}
> import numpy as np
> import pyarrow
> import pyarrow.parquet as pq
> from numpy.random import default_rng
> from time import time
> # Create a large, random array to save. ~1.5 GB.
> rng = default_rng(seed = 1)
> n_col = 4000
> n_row = 5
> mat = rng.standard_normal((n_col, n_row))
> col_names = [str(nm) for nm in range(n_col)]
> tab = pyarrow.Table.from_arrays(mat, names=col_names)
> pq.write_table(tab, "test_tab.parquet", use_dictionary=False)
> # How long does it take to read the whole thing in python?
> time_start = time()
> _ = pq.read_table("test_tab.parquet") # edit: corrected filename
> elapsed = time() - time_start
> print(elapsed) # under 1 second on my computer
> time_start = time()
> f = pq.ParquetFile("test_tab.parquet")
> for one_col in col_names:
> _ = f.read(one_col).column(0)
> elapsed = time() - time_start
> print(elapsed) # about 2 seconds
> {code}
> R benchmarking, using the same {{test_tab.parquet}} file
> {code:r}
> library(arrow)
> read_by_column <- function(f) {
> table = ParquetFileReader$create(f)
> cols <- as.character(0:3999)
> purrr::walk(cols, ~table$ReadTable(.)$column(0))
> }
> bench::mark(
> read_parquet("test_tab.parquet", as_data_frame=FALSE), #   0.6 s
> read_parquet("test_tab.parquet", as_data_frame=TRUE),  #   1 s
> read_by_column("test_tab.parquet"),# 100 s
> check=FALSE
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8489) [Developer] Autotune more things

2020-09-07 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191635#comment-17191635
 ] 

Uwe Korn commented on ARROW-8489:
-

Takeing care of cmake-format in https://issues.apache.org/jira/browse/ARROW-9929

> [Developer] Autotune more things
> 
>
> Key: ARROW-8489
> URL: https://issues.apache.org/jira/browse/ARROW-8489
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Python
>Reporter: Neal Richardson
>Priority: Major
>
> ARROW-7801 added the "autotune" comment bot to fix linting errors and rebuild 
> some generated files. cmake-format was left off because of Python problems 
> (see description on https://github.com/apache/arrow/pull/6932). And there's 
> probably other things we want to add (autopep8 for python, and similar for 
> other languages?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9929:
---

 Summary: [Developer] Autotune cmake-format
 Key: ARROW-9929
 URL: https://issues.apache.org/jira/browse/ARROW-9929
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9928:
--
Labels: pull-request-available  (was: )

> [C++] Speed up integer parsing slightly
> ---
>
> Key: ARROW-9928
> URL: https://issues.apache.org/jira/browse/ARROW-9928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By exiting early out of the parsing routine when the input is exhausted, we 
> can save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9928:


Assignee: Apache Arrow JIRA Bot  (was: Antoine Pitrou)

> [C++] Speed up integer parsing slightly
> ---
>
> Key: ARROW-9928
> URL: https://issues.apache.org/jira/browse/ARROW-9928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Apache Arrow JIRA Bot
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By exiting early out of the parsing routine when the input is exhausted, we 
> can save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9928:


Assignee: Antoine Pitrou  (was: Apache Arrow JIRA Bot)

> [C++] Speed up integer parsing slightly
> ---
>
> Key: ARROW-9928
> URL: https://issues.apache.org/jira/browse/ARROW-9928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By exiting early out of the parsing routine when the input is exhausted, we 
> can save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9928:
-

 Summary: [C++] Speed up integer parsing slightly
 Key: ARROW-9928
 URL: https://issues.apache.org/jira/browse/ARROW-9928
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


By exiting early out of the parsing routine when the input is exhausted, we can 
save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9928) [C++] Speed up integer parsing slightly

2020-09-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-9928:
-

Assignee: Antoine Pitrou

> [C++] Speed up integer parsing slightly
> ---
>
> Key: ARROW-9928
> URL: https://issues.apache.org/jira/browse/ARROW-9928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Trivial
>
> By exiting early out of the parsing routine when the input is exhausted, we 
> can save a little bit a processing time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9927) [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Pal (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pal updated ARROW-9927:
---
Affects Version/s: (was: 1.0.0)
   1.0.1

> [R] Add dplyr group_by, summarise and mutate support in function open_dataset 
> R arrow package  
> ---
>
> Key: ARROW-9927
> URL: https://issues.apache.org/jira/browse/ARROW-9927
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.1
>Reporter: Pal
>Priority: Major
>
> Hi, 
>  
> The open_dataset() function in the R arrow package already includes the 
> support for dplyr filter, select and rename functions. However, it would be a 
> huge improvement if it also could include other functions such as group_by, 
> summarise and mutate before calling collect(). Is there any idea or projet 
> going on to do so ? Would be it possible to include those features 
> (compatible also with dplyr version < 1) ?
> Many thanks for this excellent job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9927) [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Pal (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pal updated ARROW-9927:
---
Description: 
Hi, 

 

The open_dataset() function in the R arrow package already includes the support 
for dplyr filter, select and rename functions. However, it would be a huge 
improvement if it also could include other functions such as group_by, 
summarise and mutate before calling collect(). Is there any idea or projet 
going on to do so ? Would be it possible to include those features (compatible 
also with dplyr version < 1) ?

Many thanks for this excellent job.

  was:
Hi, 

 

The open_dataset() function in the R arrow package already includes the support 
for dplyr filter, select and rename functions. However, it would be a huge 
improvement if it also could include other functions such as group_by, 
summarise and mutate before calling collect(). Is there any idea or projet 
going on to do so ? Would be it possible to include those features ?

Many thanks for this excellent job.


> [R] Add dplyr group_by, summarise and mutate support in function open_dataset 
> R arrow package  
> ---
>
> Key: ARROW-9927
> URL: https://issues.apache.org/jira/browse/ARROW-9927
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.0
>Reporter: Pal
>Priority: Major
>
> Hi, 
>  
> The open_dataset() function in the R arrow package already includes the 
> support for dplyr filter, select and rename functions. However, it would be a 
> huge improvement if it also could include other functions such as group_by, 
> summarise and mutate before calling collect(). Is there any idea or projet 
> going on to do so ? Would be it possible to include those features 
> (compatible also with dplyr version < 1) ?
> Many thanks for this excellent job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9927) [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Pal (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pal updated ARROW-9927:
---
Summary: [R] Add dplyr group_by, summarise and mutate support in function 
open_dataset R arrow package(was: Add dplyr group_by, summarise and mutate 
support in function open_dataset R arrow package  )

> [R] Add dplyr group_by, summarise and mutate support in function open_dataset 
> R arrow package  
> ---
>
> Key: ARROW-9927
> URL: https://issues.apache.org/jira/browse/ARROW-9927
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.0
>Reporter: Pal
>Priority: Major
>
> Hi, 
>  
> The open_dataset() function in the R arrow package already includes the 
> support for dplyr filter, select and rename functions. However, it would be a 
> huge improvement if it also could include other functions such as group_by, 
> summarise and mutate before calling collect(). Is there any idea or projet 
> going on to do so ? Would be it possible to include those features ?
> Many thanks for this excellent job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9927) Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Pal (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pal updated ARROW-9927:
---
  Component/s: R
Affects Version/s: 1.0.0
  Description: 
Hi, 

 

The open_dataset() function in the R arrow package already includes the support 
for dplyr filter, select and rename functions. However, it would be a huge 
improvement if it also could include other functions such as group_by, 
summarise and mutate before calling collect(). Is there any idea or projet 
going on to do so ? Would be it possible to include those features ?

Many thanks for this excellent job.

> Add dplyr group_by, summarise and mutate support in function open_dataset R 
> arrow package  
> ---
>
> Key: ARROW-9927
> URL: https://issues.apache.org/jira/browse/ARROW-9927
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.0
>Reporter: Pal
>Priority: Major
>
> Hi, 
>  
> The open_dataset() function in the R arrow package already includes the 
> support for dplyr filter, select and rename functions. However, it would be a 
> huge improvement if it also could include other functions such as group_by, 
> summarise and mutate before calling collect(). Is there any idea or projet 
> going on to do so ? Would be it possible to include those features ?
> Many thanks for this excellent job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9927) Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

2020-09-07 Thread Pal (Jira)
Pal created ARROW-9927:
--

 Summary: Add dplyr group_by, summarise and mutate support in 
function open_dataset R arrow package  
 Key: ARROW-9927
 URL: https://issues.apache.org/jira/browse/ARROW-9927
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Pal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)