date:20200630

[jira] [Updated] (ARROW-8973) [Java] Support batch value appending for large varchar/varbinary vectors

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8973:
--
Labels: pull-request-available  (was: )

> [Java] Support batch value appending for large varchar/varbinary vectors
> 
>
> Key: ARROW-8973
> URL: https://issues.apache.org/jira/browse/ARROW-8973
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Support appending values in batch for LargeVarCharVector/LargeVarBinaryVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9273) [C++] Add crossbow job to capture build setup

2020-06-30 Thread Liya Fan (Jira)

Liya Fan created ARROW-9273:
---

 Summary: [C++] Add crossbow job to capture build setup
 Key: ARROW-9273
 URL: https://issues.apache.org/jira/browse/ARROW-9273
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Liya Fan


As discussed in 
https://github.com/apache/arrow/pull/7287#issuecomment-645432605, the CI jobs 
cannot capture some build problems. So we want a crossbow job to capture such 
problems. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9191) [Rust] Do not panic when int96 milliseconds are negative

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-9191:
--
Affects Version/s: 0.17.0

> [Rust] Do not panic when int96 milliseconds are negative
> 
>
> Key: ARROW-9191
> URL: https://issues.apache.org/jira/browse/ARROW-9191
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Max Burke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See GitHub PR: [https://github.com/apache/arrow/pull/7500]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9191) [Rust] Do not panic when int96 milliseconds are negative

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-9191:
--
Component/s: Rust

> [Rust] Do not panic when int96 milliseconds are negative
> 
>
> Key: ARROW-9191
> URL: https://issues.apache.org/jira/browse/ARROW-9191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Max Burke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See GitHub PR: [https://github.com/apache/arrow/pull/7500]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9191) [Rust] Do not panic when int96 milliseconds are negative

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale reassigned ARROW-9191:
-

Assignee: Max Burke

> [Rust] Do not panic when int96 milliseconds are negative
> 
>
> Key: ARROW-9191
> URL: https://issues.apache.org/jira/browse/ARROW-9191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See GitHub PR: [https://github.com/apache/arrow/pull/7500]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9191) [Rust] Do not panic when int96 milliseconds are negative

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale resolved ARROW-9191.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7500
[https://github.com/apache/arrow/pull/7500]

> [Rust] Do not panic when int96 milliseconds are negative
> 
>
> Key: ARROW-9191
> URL: https://issues.apache.org/jira/browse/ARROW-9191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See GitHub PR: [https://github.com/apache/arrow/pull/7500]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9236) [Rust] CSV WriterBuilder never writes header

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale resolved ARROW-9236.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7554
[https://github.com/apache/arrow/pull/7554]

> [Rust] CSV WriterBuilder never writes header
> 
>
> Key: ARROW-9236
> URL: https://issues.apache.org/jira/browse/ARROW-9236
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Ritchie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The `WriterBuilder` in 
> [https://github.com/apache/arrow/blob/fee9f1df7b21f372bd30d59ec276ffd289645170/rust/arrow/src/csv/writer.rs#L360],
>  will start with the `beginning` field set to `false`. This will make the  
> `has_headers` field useless as it is skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9236) [Rust] CSV WriterBuilder never writes header

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-9236:
--
Affects Version/s: 0.17.0

> [Rust] CSV WriterBuilder never writes header
> 
>
> Key: ARROW-9236
> URL: https://issues.apache.org/jira/browse/ARROW-9236
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Ritchie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The `WriterBuilder` in 
> [https://github.com/apache/arrow/blob/fee9f1df7b21f372bd30d59ec276ffd289645170/rust/arrow/src/csv/writer.rs#L360],
>  will start with the `beginning` field set to `false`. This will make the  
> `has_headers` field useless as it is skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9236) [Rust] CSV WriterBuilder never writes header

2020-06-30 Thread Neville Dipale (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148451#comment-17148451
 ] 

Neville Dipale commented on ARROW-9236:
---

I'm unable to assign this Jira to Ritchie

> [Rust] CSV WriterBuilder never writes header
> 
>
> Key: ARROW-9236
> URL: https://issues.apache.org/jira/browse/ARROW-9236
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Ritchie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The `WriterBuilder` in 
> [https://github.com/apache/arrow/blob/fee9f1df7b21f372bd30d59ec276ffd289645170/rust/arrow/src/csv/writer.rs#L360],
>  will start with the `beginning` field set to `false`. This will make the  
> `has_headers` field useless as it is skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9269) [Rust] Cargo.toml flag to disable SIMD for targeting stable Rust

2020-06-30 Thread Neville Dipale (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148453#comment-17148453
 ] 

Neville Dipale commented on ARROW-9269:
---

Hi [~aeshirey], we still use specialization, which both arrow and parquet use, 
so we still require nightly for that feature. My understanding is that even 
with packed_simd disabled, we still require nightly.

> [Rust] Cargo.toml flag to disable SIMD for targeting stable Rust
> 
>
> Key: ARROW-9269
> URL: https://issues.apache.org/jira/browse/ARROW-9269
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Affects Versions: 0.16.0, 0.17.0
> Environment: WSL2 w/Ubuntu 18.0.4, rustc 1.44.1 stable
>Reporter: Adam Shirey
>Priority: Minor
>
> The Parquet Rust crate requires nightly Rust, apparently due to the use of 
> packed_simd, which [is using some nightly 
> features.|https://github.com/rust-lang/packed_simd/blob/54b19fb5905b9a6dcdcad7019e72c9067f0af2a4/src/lib.rs#L202].
> After some digging around, I found that [PR 
> 5269|https://github.com/apache/arrow/pull/5269/commits] / ARROW-6303 
> specifically calls for making SIMD optional, and it looks like this should be 
> possible to use this crate in stable. According to [the changelog for the 
> Cargo.toml|https://github.com/apache/arrow/pull/5269/commits/2d617e7358e2d18879a58037bfd41692392e0ce9],
>  and based on the README for [Native Rust implementation of Apache 
> Arrow|https://github.com/apache/arrow/blob/6299c25ffd579ba4773d7b7b4cbab9b86e0847f4/rust/arrow/README.md#simd-single-instruction-multiple-data],
>  it looks like one can disable SIMD for Arrow.
> However, this doesn't apply to the Parquet crate. (I must admit that I'm not 
> terribly familiar with the distinction between Arrow and Parquet - memory 
> format vs disk format?) I tried disabling the feature in my Cargo.toml:
> {{[dependencies.parquet]}}
>  {{version = "0.17"}}
>  {{default-features = false}}
> And I also tried {{cargo build --no-default-features}}, but the Parquet crate 
> seems to still require nightly. Maybe a similar change needs to be applied to 
> the Parquet code?
> Is it possible to use the Parquet crate on stable Rust? If so, how? And by 
> disabling SIMD, what are the (approximate) performance implications?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9274) [Rust] Builds failing due to IPC test failures

2020-06-30 Thread Neville Dipale (Jira)

Neville Dipale created ARROW-9274:
-

 Summary: [Rust] Builds failing due to IPC test failures
 Key: ARROW-9274
 URL: https://issues.apache.org/jira/browse/ARROW-9274
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Neville Dipale


I just saw this after merging 2 PRs, I'm investigating what the cause of the 
failures is



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9274) [Rust] Builds failing due to IPC test failures

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale reassigned ARROW-9274:
-

Assignee: Neville Dipale

> [Rust] Builds failing due to IPC test failures
> --
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>
> I just saw this after merging 2 PRs, I'm investigating what the cause of the 
> failures is



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9261) [Python][Packaging] S3FileSystem curl errors in manylinux wheels

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9261.
---
Resolution: Fixed

Issue resolved by pull request 7580
[https://github.com/apache/arrow/pull/7580]

> [Python][Packaging] S3FileSystem curl errors in manylinux wheels
> 
>
> Key: ARROW-9261
> URL: https://issues.apache.org/jira/browse/ARROW-9261
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Reporter: Roee Shlomo
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/ARROW-9109 introduced S3 support in 
> manylinux wheels. However, when trying to use S3FileSystem it fails with
>  
> {code:java}
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "pyarrow/_fs.pyx", line 597, in pyarrow._fs.FileSystem.open_input_stream
>  File "pyarrow/error.pxi", line 122, in 
> pyarrow.lib.pyarrow_internal_check_status
>  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> OSError: When reading information for key 'x' in bucket 'x': AWS 
> Error [code 99]: curlCode: 77, Problem with the SSL CA cert (path? access 
> rights?) with address{code}
> It seems like it can't find the SSL CA cert directory that is installed in 
> the runtime machine (tested on Ubuntu 16.04 and Ubuntu 18.04). It always 
> searches in /etc/pki/tls/certs/ca-bundle.crt probably because the wheels are 
> built on centos, whereas in Ubuntu the path is 
> /etc/ssl/certs/ca-certificates.crt and is different on other distributions.
> Reproduce with:
> {code:java}
> virtualenv -p python3.8 arrowenv
> source arrowenv/bin/activate
> pip install --extra-index-url https://repo.fury.io/arrow-nightlies/ --pre 
> pyarrow
> python -c "from pyarrow.fs import S3FileSystem; fs = S3FileSystem(); 
> fs.open_input_stream('mybucket/myfile')"{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-9274:
--
Summary: [Rust] [Integration Testing] Read i64 from json files as strings  
(was: [Rust] Builds failing due to IPC test failures)

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>
> I just saw this after merging 2 PRs, I'm investigating what the cause of the 
> failures is



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-9274:
--
Description: The integration files were recently changed to use strings for 
64-bit numbers, and this caused failures for the relevant Rust Arrow IPC test 
cases. This Jira is for the work to fix that.  (was: I just saw this after 
merging 2 PRs, I'm investigating what the cause of the failures is)

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>
> The integration files were recently changed to use strings for 64-bit 
> numbers, and this caused failures for the relevant Rust Arrow IPC test cases. 
> This Jira is for the work to fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-9274:
--
Affects Version/s: 0.17.0

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>
> The integration files were recently changed to use strings for 64-bit 
> numbers, and this caused failures for the relevant Rust Arrow IPC test cases. 
> This Jira is for the work to fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9274:
--
Labels: pull-request-available  (was: )

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The integration files were recently changed to use strings for 64-bit 
> numbers, and this caused failures for the relevant Rust Arrow IPC test cases. 
> This Jira is for the work to fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-9274:
---

Assignee: Neville Dipale  (was: Wes McKinney)

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The integration files were recently changed to use strings for 64-bit 
> numbers, and this caused failures for the relevant Rust Arrow IPC test cases. 
> This Jira is for the work to fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-9274:
---

Assignee: Wes McKinney  (was: Neville Dipale)

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Assignee: Wes McKinney
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The integration files were recently changed to use strings for 64-bit 
> numbers, and this caused failures for the relevant Rust Arrow IPC test cases. 
> This Jira is for the work to fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-06-30 Thread Mahmut Bulut (Jira)

Mahmut Bulut created ARROW-9275:
---

 Summary: [Rust] – Async Sans IO: R/W into/to Arrow Arrays
 Key: ARROW-9275
 URL: https://issues.apache.org/jira/browse/ARROW-9275
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


This issue can be considered an epic level that spans across other arrow 
projects.

*Drill down*

Currently, traits like `ParquetReader` only allow synchronous interface which 
uses BufReader having 8KB constant buffer. Over the network, this becomes a 
problem. This can be easily solvable with differential buffers. In addition to 
this shortage, there is a problem of executor engine is needed to schedule from 
async trait methods to sync trait methods which should sit somewhere in between 
to make requests asynchronous to external IO. On-disk IO is acceptable with the 
approach we currently have since no reliable evented IO exists for on-disk IO 
on major platforms.

All these considered abstractions that will expose asynchronous IO without any 
side from executors, needs to be exposed.

 

*Design Suggestions & Considerations*

The design should apply and consider:
 * Sans IO, (for more information about Sans approach please see 
[https://sans-io.readthedocs.io/] ) 
 * Not including any executor specific data, at all.
 * Tests should work with any executor with little to no modification.
 * Buffers are adjusted accordingly and use differential buffers to optimize 
network trips.
 * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
traits or we do overlapping implementation, that will make our life harder in 
the future. Sans IO should be compartmentalized.

 

*Notes*

If Sans approach is not taken, the project will:
 * use an extreme amount of dependencies.
 * be not compatible with other Rust code at all.
 * break currently working code uses array ingestions.
 * integrations tests are going to be harder.
 * it will really hard to adapt to completion-based APIs stabilize in the 
future. (in the user projects)
 * this suggestion is not about the in-flight format or any in-flight related 
information atm. This is purely making on-disk, remote IO (provider backends 
like AWS etc.) async.

 

*Open points*

A couple of open points:
 * Identifying traits that are going to be asyncized.
 * Designing internal routines.
 * package name to expose.
 * Gather traits into the designated packages in all file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-06-30 Thread Mahmut Bulut (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148510#comment-17148510
 ] 

Mahmut Bulut commented on ARROW-9275:
-

[~nevi_me], [~andygrove], [~paddyhoran] I need input for this from you if 
possible.

> [Rust] – Async Sans IO: R/W into/to Arrow Arrays
> 
>
> Key: ARROW-9275
> URL: https://issues.apache.org/jira/browse/ARROW-9275
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> This issue can be considered an epic level that spans across other arrow 
> projects.
> *Drill down*
> Currently, traits like `ParquetReader` only allow synchronous interface which 
> uses BufReader having 8KB constant buffer. Over the network, this becomes a 
> problem. This can be easily solvable with differential buffers. In addition 
> to this shortage, there is a problem of executor engine is needed to schedule 
> from async trait methods to sync trait methods which should sit somewhere in 
> between to make requests asynchronous to external IO. On-disk IO is 
> acceptable with the approach we currently have since no reliable evented IO 
> exists for on-disk IO on major platforms.
> All these considered abstractions that will expose asynchronous IO without 
> any side from executors, needs to be exposed.
>  
> *Design Suggestions & Considerations*
> The design should apply and consider:
>  * Sans IO, (for more information about Sans approach please see 
> [https://sans-io.readthedocs.io/] ) 
>  * Not including any executor specific data, at all.
>  * Tests should work with any executor with little to no modification.
>  * Buffers are adjusted accordingly and use differential buffers to optimize 
> network trips.
>  * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
> traits or we do overlapping implementation, that will make our life harder in 
> the future. Sans IO should be compartmentalized.
>  
> *Notes*
> If Sans approach is not taken, the project will:
>  * use an extreme amount of dependencies.
>  * be not compatible with other Rust code at all.
>  * break currently working code uses array ingestions.
>  * integrations tests are going to be harder.
>  * it will really hard to adapt to completion-based APIs stabilize in the 
> future. (in the user projects)
>  * this suggestion is not about the in-flight format or any in-flight related 
> information atm. This is purely making on-disk, remote IO (provider backends 
> like AWS etc.) async.
>  
> *Open points*
> A couple of open points:
>  * Identifying traits that are going to be asyncized.
>  * Designing internal routines.
>  * package name to expose.
>  * Gather traits into the designated packages in all file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9133) [C++] Add utf8_upper and utf_lower

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9133.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7449
[https://github.com/apache/arrow/pull/7449]

> [C++] Add utf8_upper and utf_lower
> --
>
> Key: ARROW-9133
> URL: https://issues.apache.org/jira/browse/ARROW-9133
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Maarten Breddels
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> This is the equivalent of https://issues.apache.org/jira/browse/ARROW-9100 
> for utf8. This will be a good test for unilib vs utf8proc, performance, and 
> API wise.
> Also, since Unicode strings can grow and shrink, this is also a good start to 
> think about a strategy for memory allocation.
> How much can a 'string' (or byte sequence) length actually grow? 
> Item 5.18 mentioned that a string can expand by a factor of 3, by which they 
> seem to mean 3 codepoints. This can be validated by checking with Python:
> {code:python}
> for i in range(0x100, 0x11):
> codepoint = chr(i)
> try:
> bytes_before = codepoint.encode()
> except UnicodeEncodeError:
> continue
> bytes_after = codepoint.upper().encode()
> if len(bytes_before) != len(bytes_after):
> print(i, hex(i), codepoint, codepoint.lower(), len(bytes_before), 
> len(bytes_after))
> 
> 912 0x390 ΐ Ϊ́ 2 6
> ...{code}
> showing that a two-byte codepoint can expand to 3 (2 byte) codepoints (2 
> bytes => 6 bytes). The character Ϊ́ has no single precomposed capital 
> character, so it is composed of a single base character and two combining 
> characters. However there are different situations explain in 
> [https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt])
> This increase by a factor of 3 is used in CPython 
> [https://github.com/python/cpython/blob/25f38d7044a3a47465edd851c4e04f337b2c4b9b/Objects/unicodeobject.c#L10058]
>  which is an easy solution not to have to grow the buffer dynamically.
> However, growing 3x in size seems at odds with the API of both utf8proc:
> [https://github.com/JuliaStrings/utf8proc/blob/08fa0698639f15d07b12c0065a4494f2d504/utf8proc.c#L375]
> [https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L79]
> and unilib:
> [https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L79]
> Which can only return a single 32bit value (thus 1 codepoint, encoding 1 
> character). Both libraries seem to ignore the special cases of case mapping 
> (no library uses/downloads SpecialCasing.txt).
> This means that if Arrow wants to support the same features as Python 
> regarding upper casing and lower casing (which means really implementing the 
> Unicode), neither libraries are sufficient.
> There are more edges cases/irregularities. But I propose I start with a 
> version of utf8_lower and utf8_upper that ignore the special cases. 
>  
> PS:
> Another interesting finding is that although upper casing can increase a 
> buffer length by a factor of 3, lowercasing a utf8 string will only increase 
> the byte length by a factor of 3/2 at maximum.
> {code:python}
> for i in range(0x100, 0x11):
> codepoint = chr(i)
> try:
> bytes_before = codepoint.encode()
> except UnicodeEncodeError:
> continue
> bytes_after = codepoint.lower().encode()
> if len(bytes_before) != len(bytes_after):
> print(i, hex(i), codepoint, codepoint.lower(), len(bytes_before), 
> len(bytes_after))
> 304 0x130 İ i̇ 2 3
> 570 0x23a Ⱥ ⱥ 2 3
> 574 0x23e Ⱦ ⱦ 2 3
> 7838 0x1e9e ẞ ß 3 2
> 8486 0x2126 Ω ω 3 2
> 8490 0x212a K k 3 1
> 8491 0x212b Å å 3 2
> 11362 0x2c62 Ɫ ɫ 3 2
> 11364 0x2c64 Ɽ ɽ 3 2
> 11373 0x2c6d Ɑ ɑ 3 2
> 11374 0x2c6e Ɱ ɱ 3 2
> 11375 0x2c6f Ɐ ɐ 3 2
> 11376 0x2c70 Ɒ ɒ 3 2
> 11390 0x2c7e Ȿ ȿ 3 2
> 11391 0x2c7f Ɀ ɀ 3 2
> 42893 0xa78d Ɥ ɥ 3 2
> 42922 0xa7aa Ɦ ɦ 3 2
> 42923 0xa7ab Ɜ ɜ 3 2
> 42924 0xa7ac Ɡ ɡ 3 2
> 42925 0xa7ad Ɬ ɬ 3 2
> 42926 0xa7ae Ɪ ɪ 3 2
> 42928 0xa7b0 Ʞ ʞ 3 2
> 42929 0xa7b1 Ʇ ʇ 3 2
> 42930 0xa7b2 Ʝ ʝ 3 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9133) [C++] Add utf8_upper and utf_lower

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-9133:
-

Assignee: Maarten Breddels

> [C++] Add utf8_upper and utf_lower
> --
>
> Key: ARROW-9133
> URL: https://issues.apache.org/jira/browse/ARROW-9133
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Maarten Breddels
>Assignee: Maarten Breddels
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> This is the equivalent of https://issues.apache.org/jira/browse/ARROW-9100 
> for utf8. This will be a good test for unilib vs utf8proc, performance, and 
> API wise.
> Also, since Unicode strings can grow and shrink, this is also a good start to 
> think about a strategy for memory allocation.
> How much can a 'string' (or byte sequence) length actually grow? 
> Item 5.18 mentioned that a string can expand by a factor of 3, by which they 
> seem to mean 3 codepoints. This can be validated by checking with Python:
> {code:python}
> for i in range(0x100, 0x11):
> codepoint = chr(i)
> try:
> bytes_before = codepoint.encode()
> except UnicodeEncodeError:
> continue
> bytes_after = codepoint.upper().encode()
> if len(bytes_before) != len(bytes_after):
> print(i, hex(i), codepoint, codepoint.lower(), len(bytes_before), 
> len(bytes_after))
> 
> 912 0x390 ΐ Ϊ́ 2 6
> ...{code}
> showing that a two-byte codepoint can expand to 3 (2 byte) codepoints (2 
> bytes => 6 bytes). The character Ϊ́ has no single precomposed capital 
> character, so it is composed of a single base character and two combining 
> characters. However there are different situations explain in 
> [https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt])
> This increase by a factor of 3 is used in CPython 
> [https://github.com/python/cpython/blob/25f38d7044a3a47465edd851c4e04f337b2c4b9b/Objects/unicodeobject.c#L10058]
>  which is an easy solution not to have to grow the buffer dynamically.
> However, growing 3x in size seems at odds with the API of both utf8proc:
> [https://github.com/JuliaStrings/utf8proc/blob/08fa0698639f15d07b12c0065a4494f2d504/utf8proc.c#L375]
> [https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L79]
> and unilib:
> [https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L79]
> Which can only return a single 32bit value (thus 1 codepoint, encoding 1 
> character). Both libraries seem to ignore the special cases of case mapping 
> (no library uses/downloads SpecialCasing.txt).
> This means that if Arrow wants to support the same features as Python 
> regarding upper casing and lower casing (which means really implementing the 
> Unicode), neither libraries are sufficient.
> There are more edges cases/irregularities. But I propose I start with a 
> version of utf8_lower and utf8_upper that ignore the special cases. 
>  
> PS:
> Another interesting finding is that although upper casing can increase a 
> buffer length by a factor of 3, lowercasing a utf8 string will only increase 
> the byte length by a factor of 3/2 at maximum.
> {code:python}
> for i in range(0x100, 0x11):
> codepoint = chr(i)
> try:
> bytes_before = codepoint.encode()
> except UnicodeEncodeError:
> continue
> bytes_after = codepoint.lower().encode()
> if len(bytes_before) != len(bytes_after):
> print(i, hex(i), codepoint, codepoint.lower(), len(bytes_before), 
> len(bytes_after))
> 304 0x130 İ i̇ 2 3
> 570 0x23a Ⱥ ⱥ 2 3
> 574 0x23e Ⱦ ⱦ 2 3
> 7838 0x1e9e ẞ ß 3 2
> 8486 0x2126 Ω ω 3 2
> 8490 0x212a K k 3 1
> 8491 0x212b Å å 3 2
> 11362 0x2c62 Ɫ ɫ 3 2
> 11364 0x2c64 Ɽ ɽ 3 2
> 11373 0x2c6d Ɑ ɑ 3 2
> 11374 0x2c6e Ɱ ɱ 3 2
> 11375 0x2c6f Ɐ ɐ 3 2
> 11376 0x2c70 Ɒ ɒ 3 2
> 11390 0x2c7e Ȿ ȿ 3 2
> 11391 0x2c7f Ɀ ɀ 3 2
> 42893 0xa78d Ɥ ɥ 3 2
> 42922 0xa7aa Ɦ ɦ 3 2
> 42923 0xa7ab Ɜ ɜ 3 2
> 42924 0xa7ac Ɡ ɡ 3 2
> 42925 0xa7ad Ɬ ɬ 3 2
> 42926 0xa7ae Ɪ ɪ 3 2
> 42928 0xa7b0 Ʞ ʞ 3 2
> 42929 0xa7b1 Ʇ ʇ 3 2
> 42930 0xa7b2 Ʝ ʝ 3 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9133) [C++] Add utf8_upper and utf8_lower

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-9133:
--
Summary: [C++] Add utf8_upper and utf8_lower  (was: [C++] Add utf8_upper 
and utf_lower)

> [C++] Add utf8_upper and utf8_lower
> ---
>
> Key: ARROW-9133
> URL: https://issues.apache.org/jira/browse/ARROW-9133
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Maarten Breddels
>Assignee: Maarten Breddels
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> This is the equivalent of https://issues.apache.org/jira/browse/ARROW-9100 
> for utf8. This will be a good test for unilib vs utf8proc, performance, and 
> API wise.
> Also, since Unicode strings can grow and shrink, this is also a good start to 
> think about a strategy for memory allocation.
> How much can a 'string' (or byte sequence) length actually grow? 
> Item 5.18 mentioned that a string can expand by a factor of 3, by which they 
> seem to mean 3 codepoints. This can be validated by checking with Python:
> {code:python}
> for i in range(0x100, 0x11):
> codepoint = chr(i)
> try:
> bytes_before = codepoint.encode()
> except UnicodeEncodeError:
> continue
> bytes_after = codepoint.upper().encode()
> if len(bytes_before) != len(bytes_after):
> print(i, hex(i), codepoint, codepoint.lower(), len(bytes_before), 
> len(bytes_after))
> 
> 912 0x390 ΐ Ϊ́ 2 6
> ...{code}
> showing that a two-byte codepoint can expand to 3 (2 byte) codepoints (2 
> bytes => 6 bytes). The character Ϊ́ has no single precomposed capital 
> character, so it is composed of a single base character and two combining 
> characters. However there are different situations explain in 
> [https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt])
> This increase by a factor of 3 is used in CPython 
> [https://github.com/python/cpython/blob/25f38d7044a3a47465edd851c4e04f337b2c4b9b/Objects/unicodeobject.c#L10058]
>  which is an easy solution not to have to grow the buffer dynamically.
> However, growing 3x in size seems at odds with the API of both utf8proc:
> [https://github.com/JuliaStrings/utf8proc/blob/08fa0698639f15d07b12c0065a4494f2d504/utf8proc.c#L375]
> [https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L79]
> and unilib:
> [https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L79]
> Which can only return a single 32bit value (thus 1 codepoint, encoding 1 
> character). Both libraries seem to ignore the special cases of case mapping 
> (no library uses/downloads SpecialCasing.txt).
> This means that if Arrow wants to support the same features as Python 
> regarding upper casing and lower casing (which means really implementing the 
> Unicode), neither libraries are sufficient.
> There are more edges cases/irregularities. But I propose I start with a 
> version of utf8_lower and utf8_upper that ignore the special cases. 
>  
> PS:
> Another interesting finding is that although upper casing can increase a 
> buffer length by a factor of 3, lowercasing a utf8 string will only increase 
> the byte length by a factor of 3/2 at maximum.
> {code:python}
> for i in range(0x100, 0x11):
> codepoint = chr(i)
> try:
> bytes_before = codepoint.encode()
> except UnicodeEncodeError:
> continue
> bytes_after = codepoint.lower().encode()
> if len(bytes_before) != len(bytes_after):
> print(i, hex(i), codepoint, codepoint.lower(), len(bytes_before), 
> len(bytes_after))
> 304 0x130 İ i̇ 2 3
> 570 0x23a Ⱥ ⱥ 2 3
> 574 0x23e Ⱦ ⱦ 2 3
> 7838 0x1e9e ẞ ß 3 2
> 8486 0x2126 Ω ω 3 2
> 8490 0x212a K k 3 1
> 8491 0x212b Å å 3 2
> 11362 0x2c62 Ɫ ɫ 3 2
> 11364 0x2c64 Ɽ ɽ 3 2
> 11373 0x2c6d Ɑ ɑ 3 2
> 11374 0x2c6e Ɱ ɱ 3 2
> 11375 0x2c6f Ɐ ɐ 3 2
> 11376 0x2c70 Ɒ ɒ 3 2
> 11390 0x2c7e Ȿ ȿ 3 2
> 11391 0x2c7f Ɀ ɀ 3 2
> 42893 0xa78d Ɥ ɥ 3 2
> 42922 0xa7aa Ɦ ɦ 3 2
> 42923 0xa7ab Ɜ ɜ 3 2
> 42924 0xa7ac Ɡ ɡ 3 2
> 42925 0xa7ad Ɬ ɬ 3 2
> 42926 0xa7ae Ɪ ɪ 3 2
> 42928 0xa7b0 Ʞ ʞ 3 2
> 42929 0xa7b1 Ʇ ʇ 3 2
> 42930 0xa7b2 Ʝ ʝ 3 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-06-30 Thread Mahmut Bulut (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-9275:

Description: 
This issue can be considered an epic level that spans across other arrow 
projects.

*Drill down*

Currently, traits like `ParquetReader` only allow synchronous interface which 
uses BufReader having 8KB constant buffer. Over the network, this becomes a 
problem. This can be easily solvable with differential buffers. In addition to 
this shortage, there is a problem of executor engine is needed to schedule from 
async trait methods to sync trait methods which should sit somewhere in between 
to make requests asynchronous to external IO. On-disk IO is acceptable with the 
approach we currently have since no reliable evented IO exists for on-disk IO 
on major platforms.

All these considered abstractions that will expose asynchronous IO without any 
side from executors, needs to be exposed.

 

*Design Suggestions & Considerations*

The design should apply and consider:
 * Sans IO, (for more information about Sans approach please see 
[https://sans-io.readthedocs.io/] ) 
 * Not including any executor specific data, at all.
 * Tests should work with any executor with little to no modification.
 * Buffers are adjusted accordingly and use differential buffers to optimize 
network trips.
 * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
traits or we do overlapping implementation, that will make our life harder in 
the future. Sans IO should be compartmentalized.

 

*Notes*

If Sans approach is not taken, the project will:
 * use an extreme amount of dependencies.
 * be not compatible with other Rust code at all.
 * break currently working code uses array ingestions.
 * integrations tests are going to be harder.
 * it will really hard to adapt to completion-based APIs stabilize in the 
future. (in the user projects)
 * this suggestion is not about the flight format or any flight-related 
information atm. This is purely making on-disk, remote IO (provider backends 
like AWS etc.) async.

 

*Open points*

A couple of open points:
 * Identifying traits that are going to be asyncized.
 * Designing internal routines.
 * package name to expose.
 * Gather traits into the designated packages in all file formats.

  was:
This issue can be considered an epic level that spans across other arrow 
projects.

*Drill down*

Currently, traits like `ParquetReader` only allow synchronous interface which 
uses BufReader having 8KB constant buffer. Over the network, this becomes a 
problem. This can be easily solvable with differential buffers. In addition to 
this shortage, there is a problem of executor engine is needed to schedule from 
async trait methods to sync trait methods which should sit somewhere in between 
to make requests asynchronous to external IO. On-disk IO is acceptable with the 
approach we currently have since no reliable evented IO exists for on-disk IO 
on major platforms.

All these considered abstractions that will expose asynchronous IO without any 
side from executors, needs to be exposed.

 

*Design Suggestions & Considerations*

The design should apply and consider:
 * Sans IO, (for more information about Sans approach please see 
[https://sans-io.readthedocs.io/] ) 
 * Not including any executor specific data, at all.
 * Tests should work with any executor with little to no modification.
 * Buffers are adjusted accordingly and use differential buffers to optimize 
network trips.
 * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
traits or we do overlapping implementation, that will make our life harder in 
the future. Sans IO should be compartmentalized.

 

*Notes*

If Sans approach is not taken, the project will:
 * use an extreme amount of dependencies.
 * be not compatible with other Rust code at all.
 * break currently working code uses array ingestions.
 * integrations tests are going to be harder.
 * it will really hard to adapt to completion-based APIs stabilize in the 
future. (in the user projects)
 * this suggestion is not about the in-flight format or any in-flight related 
information atm. This is purely making on-disk, remote IO (provider backends 
like AWS etc.) async.

 

*Open points*

A couple of open points:
 * Identifying traits that are going to be asyncized.
 * Designing internal routines.
 * package name to expose.
 * Gather traits into the designated packages in all file formats.


> [Rust] – Async Sans IO: R/W into/to Arrow Arrays
> 
>
> Key: ARROW-9275
> URL: https://issues.apache.org/jira/browse/ARROW-9275
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> This issue can be considered an

[jira] [Commented] (ARROW-9006) [C++] Use Cast kernels to implement Scalar::Parse and Scalar::CastTo

2020-06-30 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148572#comment-17148572
 ] 

Krisztian Szucs commented on ARROW-9006:


We'll need to add support for copying scalars and casting temporal scalars 
before we can switch to the compute implementation.

> [C++] Use Cast kernels to implement Scalar::Parse and Scalar::CastTo
> 
>
> Key: ARROW-9006
> URL: https://issues.apache.org/jira/browse/ARROW-9006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> We should not maintain distinct (and possibly differently behaving) 
> implementations of elementwise array casting and scalar casting. The new 
> kernels framework provides for relatively easily generating kernels that can 
> process arrays or scalars. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9044) [Go][Packaging] Revisit the license file attachment to the go packages

2020-06-30 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-9044:
---
Description: 
As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475

A nicer solution would be to rename the top level LICENSE.txt to LICENSE, so we 
wouldn't need to maintain another copy of it. 


  was:As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475


> [Go][Packaging] Revisit the license file attachment to the go packages
> --
>
> Key: ARROW-9044
> URL: https://issues.apache.org/jira/browse/ARROW-9044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Minor
> Fix For: 1.0.0
>
>
> As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475
> A nicer solution would be to rename the top level LICENSE.txt to LICENSE, so 
> we wouldn't need to maintain another copy of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9044) [Go][Packaging] Revisit the license file attachment to the go packages

2020-06-30 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-9044:
---
Fix Version/s: (was: 1.0.0)

> [Go][Packaging] Revisit the license file attachment to the go packages
> --
>
> Key: ARROW-9044
> URL: https://issues.apache.org/jira/browse/ARROW-9044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Minor
>
> As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475
> A nicer solution would be to rename the top level LICENSE.txt to LICENSE, so 
> we wouldn't need to maintain another copy of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9044) [Go][Packaging] Revisit the license file attachment to the go packages

2020-06-30 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-9044:
---
Fix Version/s: 2.0.0

> [Go][Packaging] Revisit the license file attachment to the go packages
> --
>
> Key: ARROW-9044
> URL: https://issues.apache.org/jira/browse/ARROW-9044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Minor
> Fix For: 2.0.0
>
>
> As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475
> A nicer solution would be to rename the top level LICENSE.txt to LICENSE, so 
> we wouldn't need to maintain another copy of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9044) [Go][Packaging] Revisit the license file attachment to the go packages

2020-06-30 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148582#comment-17148582
 ] 

Krisztian Szucs commented on ARROW-9044:


The go packaging issue should be resolved by 
https://github.com/apache/arrow/pull/7376

Renaming the top level LICENSE.txt file would be a too invasive change for the 
1.0 release, so I'm postponing it to 2.0

> [Go][Packaging] Revisit the license file attachment to the go packages
> --
>
> Key: ARROW-9044
> URL: https://issues.apache.org/jira/browse/ARROW-9044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Minor
> Fix For: 1.0.0
>
>
> As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475
> A nicer solution would be to rename the top level LICENSE.txt to LICENSE, so 
> we wouldn't need to maintain another copy of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8078) [Python] Missing links in the docs regarding field and schema DataTypes

2020-06-30 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148597#comment-17148597
 ] 

Krisztian Szucs commented on ARROW-8078:


Updated the release management guide 
https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-GeneratingnewAPIdocumentationsandupdatethewebsite

> [Python] Missing links in the docs regarding field and schema DataTypes
> ---
>
> Key: ARROW-8078
> URL: https://issues.apache.org/jira/browse/ARROW-8078
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Otávio Vasques
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> The current page of Data Types of the pyarrow documentation has a list of the 
> different objects and types that you can use to build custom pyarrow schema. 
> For a lot of them is possible to click in the white box around the 
> module/object name and access a detailed description of the module/object. 
> This is true for almost all the items but for field and schema it's not 
> possible to access the detailed description althoug the corresponding URL 
> (replacing the datatype for a known link with these examples) exists.
> Not sure if this is a bug or this boxes should point to another page but if 
> not I think they should point to the correspoding detailed descriptions.
> Data Types: [https://arrow.apache.org/docs/python/api/datatypes.html]
> Sample Type Int 32: 
> [https://arrow.apache.org/docs/python/generated/pyarrow.int32.html#pyarrow.int32]
> Field: [https://arrow.apache.org/docs/python/generated/pyarrow.field.html]
> Schema: [https://arrow.apache.org/docs/python/generated/pyarrow.schema.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale reassigned ARROW-8535:
-

Assignee: Neville Dipale  (was: Andy Grove)

> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow Cargo.toml has:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true } {code}
> It should be:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true, version = 
> "1.0.0-SNAPSHOT" } {code}
> Also need to update release scripts to replace this version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9276) [Release] Enforce CUDA device for updating the api documentations

2020-06-30 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-9276:
--

 Summary: [Release] Enforce CUDA device for updating the api 
documentations
 Key: ARROW-9276
 URL: https://issues.apache.org/jira/browse/ARROW-9276
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 1.0.0


Update the post-09-docs.sh script to check that CUDA device is available and 
use the ubuntu-cuda-docs docker image to generate the apidocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9276) [Release] Enforce CUDA device for updating the api documentations

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9276:
--
Labels: pull-request-available  (was: )

> [Release] Enforce CUDA device for updating the api documentations
> -
>
> Key: ARROW-9276
> URL: https://issues.apache.org/jira/browse/ARROW-9276
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update the post-09-docs.sh script to check that CUDA device is available and 
> use the ubuntu-cuda-docs docker image to generate the apidocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9276) [Release] Enforce CUDA device for updating the api documentations

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-9276:
---

Assignee: Krisztian Szucs  (was: Wes McKinney)

> [Release] Enforce CUDA device for updating the api documentations
> -
>
> Key: ARROW-9276
> URL: https://issues.apache.org/jira/browse/ARROW-9276
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update the post-09-docs.sh script to check that CUDA device is available and 
> use the ubuntu-cuda-docs docker image to generate the apidocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9276) [Release] Enforce CUDA device for updating the api documentations

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-9276:
---

Assignee: Wes McKinney  (was: Krisztian Szucs)

> [Release] Enforce CUDA device for updating the api documentations
> -
>
> Key: ARROW-9276
> URL: https://issues.apache.org/jira/browse/ARROW-9276
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update the post-09-docs.sh script to check that CUDA device is available and 
> use the ubuntu-cuda-docs docker image to generate the apidocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8078) [Python] Missing links in the docs regarding field and schema DataTypes

2020-06-30 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-8078.

Resolution: Resolved

> [Python] Missing links in the docs regarding field and schema DataTypes
> ---
>
> Key: ARROW-8078
> URL: https://issues.apache.org/jira/browse/ARROW-8078
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Otávio Vasques
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> The current page of Data Types of the pyarrow documentation has a list of the 
> different objects and types that you can use to build custom pyarrow schema. 
> For a lot of them is possible to click in the white box around the 
> module/object name and access a detailed description of the module/object. 
> This is true for almost all the items but for field and schema it's not 
> possible to access the detailed description althoug the corresponding URL 
> (replacing the datatype for a known link with these examples) exists.
> Not sure if this is a bug or this boxes should point to another page but if 
> not I think they should point to the correspoding detailed descriptions.
> Data Types: [https://arrow.apache.org/docs/python/api/datatypes.html]
> Sample Type Int 32: 
> [https://arrow.apache.org/docs/python/generated/pyarrow.int32.html#pyarrow.int32]
> Field: [https://arrow.apache.org/docs/python/generated/pyarrow.field.html]
> Schema: [https://arrow.apache.org/docs/python/generated/pyarrow.schema.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8535:
--
Labels: pull-request-available  (was: )

> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Arrow Cargo.toml has:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true } {code}
> It should be:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true, version = 
> "1.0.0-SNAPSHOT" } {code}
> Also need to update release scripts to replace this version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8535:
---

Assignee: Wes McKinney  (was: Neville Dipale)

> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Wes McKinney
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Arrow Cargo.toml has:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true } {code}
> It should be:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true, version = 
> "1.0.0-SNAPSHOT" } {code}
> Also need to update release scripts to replace this version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8535:
---

Assignee: Neville Dipale  (was: Wes McKinney)

> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Arrow Cargo.toml has:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true } {code}
> It should be:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true, version = 
> "1.0.0-SNAPSHOT" } {code}
> Also need to update release scripts to replace this version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9277) Fix documentation of Reading CSV files

2020-06-30 Thread Masaki Kozuki (Jira)

Masaki Kozuki created ARROW-9277:


 Summary: Fix documentation of Reading CSV files
 Key: ARROW-9277
 URL: https://issues.apache.org/jira/browse/ARROW-9277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
 Environment: Ubuntu 18.04
Reporter: Masaki Kozuki


I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage] 
but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6482) [Rust] Investigate enabling features in regex crate to reduce compile times

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-6482:
--
Fix Version/s: (was: 1.0.0)

> [Rust] Investigate enabling features in regex crate to reduce compile times
> ---
>
> Key: ARROW-6482
> URL: https://issues.apache.org/jira/browse/ARROW-6482
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Priority: Minor
>  Labels: beginner
>
> The regex crate recently added a feature flag to reduce compile times and 
> binary size if certain unicode related features are not needed.  We should 
> investigate using this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6583) [Rust] Question and Request for Examples of Array Operations

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-6583:
--
Fix Version/s: (was: 1.0.0)

> [Rust] Question and Request for Examples of Array Operations
> 
>
> Key: ARROW-6583
> URL: https://issues.apache.org/jira/browse/ARROW-6583
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: [DELETED]
>Priority: Minor
>
> Hi all, thank you for your excellent work on Arrow.
> As I was going through the example for the Rust Arrow implementation, 
> specifically the read_csv example 
> [https://github.com/apache/arrow/blob/master/rust/arrow/examples/read_csv.rs] 
> , as well as the generated Rustdocs, and unit tests, it was not quite clear 
> what the intended usage is for operations such as filtering and masking over 
> Arrays.
> One particular use-case I'm interested in is finding all values in an Array 
> such that x >= N for all x. I came across arrow::compute::array_ops::filter, 
> which seems to be similar to what I want, although it's expecting a mask to 
> already be constructed before performing the filter operation, and it was not 
> obviously visible in the documentation, leading me to believe this might not 
> be idiomatic usage.
> More generally, is the expectation for Arrays on the Rust side that they are 
> just simple data abstractions, without exposing higher-order methods such as 
> filtering/masking? Is the intent to leave that to users? If I missed some 
> piece of documentation, please let me know. For my use-case I ended up trying 
> something like:
> {code:java}
> let column = batch.column(0).as_any().downcast_ref::().unwrap();
> let mut builder = BooleanBuilder::new(batch.num_rows());
> let N = 5.0;
> for i in 0..batch.num_rows() {
>if column.value(i).unwrap() > N {
>   builder.append_value(true).unwrap();
>} else {
>   builder.append_value(false).unwrap();
>}
> }
> let mask = builder.finish();
> let filtered_column = filter(column, mask);{code}
> If possible, could you provide examples of intended usage of Arrays? Thank 
> you!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9006) [C++] Use Cast kernels to implement Scalar::Parse and Scalar::CastTo

2020-06-30 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-9006:
---
Fix Version/s: (was: 1.0.0)
   2.0.0

> [C++] Use Cast kernels to implement Scalar::Parse and Scalar::CastTo
> 
>
> Key: ARROW-9006
> URL: https://issues.apache.org/jira/browse/ARROW-9006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 2.0.0
>
>
> We should not maintain distinct (and possibly differently behaving) 
> implementations of elementwise array casting and scalar casting. The new 
> kernels framework provides for relatively easily generating kernels that can 
> process arrays or scalars. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9006) [C++] Use Cast kernels to implement Scalar::Parse and Scalar::CastTo

2020-06-30 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148626#comment-17148626
 ] 

Krisztian Szucs commented on ARROW-9006:


Postponing to the next release.

> [C++] Use Cast kernels to implement Scalar::Parse and Scalar::CastTo
> 
>
> Key: ARROW-9006
> URL: https://issues.apache.org/jira/browse/ARROW-9006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> We should not maintain distinct (and possibly differently behaving) 
> implementations of elementwise array casting and scalar casting. The new 
> kernels framework provides for relatively easily generating kernels that can 
> process arrays or scalars. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9277) Fix documentation of Reading CSV files

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9277:
--
Labels: pull-request-available  (was: )

> Fix documentation of Reading CSV files
> --
>
> Key: ARROW-9277
> URL: https://issues.apache.org/jira/browse/ARROW-9277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
> Environment: Ubuntu 18.04
>Reporter: Masaki Kozuki
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
> files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage]
>  but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9220) [C++] Disable relevant compute kernels if ARROW_WITH_UTF8PROC=OFF

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9220:
--
Labels: pull-request-available  (was: )

> [C++] Disable relevant compute kernels if ARROW_WITH_UTF8PROC=OFF
> -
>
> Key: ARROW-9220
> URL: https://issues.apache.org/jira/browse/ARROW-9220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> utf8proc should not be a hard dependency of ARROW_COMPUTE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9277) Fix documentation of Reading CSV files

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-9277:
--
Component/s: Documentation

> Fix documentation of Reading CSV files
> --
>
> Key: ARROW-9277
> URL: https://issues.apache.org/jira/browse/ARROW-9277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
> Environment: Ubuntu 18.04
>Reporter: Masaki Kozuki
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
> files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage]
>  but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9277) Fix documentation of Reading CSV files

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-9277:
--
Fix Version/s: 1.0.0

> Fix documentation of Reading CSV files
> --
>
> Key: ARROW-9277
> URL: https://issues.apache.org/jira/browse/ARROW-9277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
> Environment: Ubuntu 18.04
>Reporter: Masaki Kozuki
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
> files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage]
>  but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9274) [Rust] [Integration Testing] Read i64 from json files as strings

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale resolved ARROW-9274.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7588
[https://github.com/apache/arrow/pull/7588]

> [Rust] [Integration Testing] Read i64 from json files as strings
> 
>
> Key: ARROW-9274
> URL: https://issues.apache.org/jira/browse/ARROW-9274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The integration files were recently changed to use strings for 64-bit 
> numbers, and this caused failures for the relevant Rust Arrow IPC test cases. 
> This Jira is for the work to fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9160) [C++] Implement string/binary contains for exact matches

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-9160:
---

Assignee: Wes McKinney  (was: Uwe Korn)

> [C++] Implement string/binary contains for exact matches
> 
>
> Key: ARROW-9160
> URL: https://issues.apache.org/jira/browse/ARROW-9160
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe Korn
>Assignee: Wes McKinney
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement {{contains}} for exact matches of subportions of a string. Using 
> the Knuth–Morris–Pratt algorithm, we should be able to do this in a linear 
> runtime with a tiny bit of preprocessing at the invocation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9160) [C++] Implement string/binary contains for exact matches

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9160:
--
Labels: Analytics pull-request-available  (was: Analytics)

> [C++] Implement string/binary contains for exact matches
> 
>
> Key: ARROW-9160
> URL: https://issues.apache.org/jira/browse/ARROW-9160
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement {{contains}} for exact matches of subportions of a string. Using 
> the Knuth–Morris–Pratt algorithm, we should be able to do this in a linear 
> runtime with a tiny bit of preprocessing at the invocation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9268) [C++] Add is{alnum,alpha,...} kernels for strings

2020-06-30 Thread Uwe Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148639#comment-17148639
 ] 

Uwe Korn commented on ARROW-9268:
-

Started to build the foundation for str->bool in 
https://github.com/apache/arrow/pull/7593.

> [C++] Add is{alnum,alpha,...} kernels for strings
> -
>
> Key: ARROW-9268
> URL: https://issues.apache.org/jira/browse/ARROW-9268
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Maarten Breddels
>Assignee: Maarten Breddels
>Priority: Major
>
> A good list of kernels to have would str->bool kernels, similar to:
> [https://docs.python.org/3/library/stdtypes.html#str.isalnum] and friends.
> I think all but `isidentifier` make sense to have. The semantics of the 
> Python functions seem quite reasonable to have in Arrow, but maybe others can 
> provide feedback if this is a complete/reasonable list to have or not.
> I am not sure if we need more (or less) functions, or if we want more atomic 
> functions, e.g. test for membership in Unicode categories. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9160) [C++] Implement string/binary contains for exact matches

2020-06-30 Thread Uwe Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn reassigned ARROW-9160:
---

Assignee: Uwe Korn  (was: Wes McKinney)

> [C++] Implement string/binary contains for exact matches
> 
>
> Key: ARROW-9160
> URL: https://issues.apache.org/jira/browse/ARROW-9160
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement {{contains}} for exact matches of subportions of a string. Using 
> the Knuth–Morris–Pratt algorithm, we should be able to do this in a linear 
> runtime with a tiny bit of preprocessing at the invocation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9277) Fix documentation of Reading CSV files

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-9277:
-

Assignee: Masaki Kozuki

> Fix documentation of Reading CSV files
> --
>
> Key: ARROW-9277
> URL: https://issues.apache.org/jira/browse/ARROW-9277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
> Environment: Ubuntu 18.04
>Reporter: Masaki Kozuki
>Assignee: Masaki Kozuki
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
> files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage]
>  but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9226) [Python] pyarrow.fs.HadoopFileSystem - retrieve options from core-site.xml or hdfs-site.xml if available

2020-06-30 Thread Bruno Quinart (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148682#comment-17148682
 ] 

Bruno Quinart commented on ARROW-9226:
--

No explicit options. However following 4 environment variables need to be set 
(showing the values for our Cloudera CDH5 setup):
 * HADOOP_HOME = /opt/cloudera/parcels/CDH
 * CLASSPATH = output of 'hadoop classpath --glob'
 * ARROW_LIBHDFS_DIR = /opt/cloudera/parcels/CDH/lib64
 * JAVA_HOME = /usr/java/default

I am not sure if it is the first or the second that allows finding the 
'default' hdfs info.

The python docs mention this default for host (refer to 
[https://arrow.apache.org/docs/python/generated/pyarrow.hdfs.connect.html] - 
_Set to "default" for fs.defaultFS from core-site.xml._)

This allows a simple fs = pa.hdfs.connect() statement.

> [Python] pyarrow.fs.HadoopFileSystem - retrieve options from core-site.xml or 
> hdfs-site.xml if available
> 
>
> Key: ARROW-9226
> URL: https://issues.apache.org/jira/browse/ARROW-9226
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 0.17.1
>Reporter: Bruno Quinart
>Priority: Minor
> Fix For: 1.0.0
>
>
> 'Legacy' pyarrow.hdfs.connect was somehow able to get the namenode info from 
> the hadoop configuration files.
> The new pyarrow.fs.HadoopFileSystem requires the host to be specified.
> Inferring this info from "the environment" makes it easier to deploy 
> pipelines.
> But more important, for HA namenodes it is almost impossible to know for sure 
> what to specify. If a rolling restart is ongoing, the namenode is changing. 
> There is no guarantee on which will be active in a HA setup.
> I tried connecting to the standby namenode. The connection gets established, 
> but when writing a file an error is raised that standby namenodes are not 
> allowed to write to.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9278) [C++] Implement Union validity bitmap changes from ARROW-9222

2020-06-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9278:
---

 Summary: [C++] Implement Union validity bitmap changes from 
ARROW-9222
 Key: ARROW-9278
 URL: https://issues.apache.org/jira/browse/ARROW-9278
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8647) [C++][Dataset] Optionally encode partition field values as dictionary type

2020-06-30 Thread Francois Saint-Jacques (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-8647.
---
Resolution: Fixed

Issue resolved by pull request 7536
[https://github.com/apache/arrow/pull/7536]

> [C++][Dataset] Optionally encode partition field values as dictionary type
> --
>
> Key: ARROW-8647
> URL: https://issues.apache.org/jira/browse/ARROW-8647
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: dataset, dataset-dask-integration, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the Python ParquetDataset implementation, the partition fields are 
> returned as dictionary type columns. 
> In the new Dataset API, we now use a plain type (integer or string when 
> inferred). But, you can already manually specify that the partition keys 
> should be dictionary type by specifying the partitioning schema (in 
> {{Partitioning}} passed to the dataset factory). 
> Since using dictionary type can be more efficient (since partition keys will 
> typically be repeated values in the resulting table), it might be good to 
> still have an option in the DatasetFactory to use dictionary types for the 
> partition fields.
> See also https://github.com/apache/arrow/pull/6303#discussion_r400622340



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-7654) [Python] Ability to set column_types to a Schema in csv.ConvertOptions is undocumented

2020-06-30 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-7654:
---
Summary: [Python] Ability to set column_types to a Schema in 
csv.ConvertOptions is undocumented  (was: [Python] Ability to Set column_types 
to a Schema in csv.ConvertOptions is Undocumented)

> [Python] Ability to set column_types to a Schema in csv.ConvertOptions is 
> undocumented
> --
>
> Key: ARROW-7654
> URL: https://issues.apache.org/jira/browse/ARROW-7654
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
>Affects Versions: 0.12.0, 0.15.1
> Environment: N/A, documentation issue.
>Reporter: Tim Lantz
>Assignee: Krisztian Szucs
>Priority: Minor
>  Labels: csv
> Fix For: 1.0.0
>
>
> Originally mentioned in: [https://github.com/apache/arrow/issues/6243]
> High level description:
>  * As of [this 
> commit|https://github.com/apache/arrow/commit/df54da211448b5202aa08ed2b245eb78cfd1e50c]
>  support to supply a Schema to ConvertOptions in the csv module module was 
> added (I'll add, extremely useful!). Marked as affected in at least 0.12.0 
> based on the commit history, as well as 0.15.1 (I cannot verify anything 
> between but would assume it is true over the whole version range).
>  * As of 0.15.1 the [published 
> documentation|https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions]
>  only explains that a dictionary from field name to DataType can be supplied.
> Minimal reproduction: N/A, see link.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-7654) [Python] Ability to set column_types to a Schema in csv.ConvertOptions is undocumented

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7654:
--
Labels: csv pull-request-available  (was: csv)

> [Python] Ability to set column_types to a Schema in csv.ConvertOptions is 
> undocumented
> --
>
> Key: ARROW-7654
> URL: https://issues.apache.org/jira/browse/ARROW-7654
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
>Affects Versions: 0.12.0, 0.15.1
> Environment: N/A, documentation issue.
>Reporter: Tim Lantz
>Assignee: Krisztian Szucs
>Priority: Minor
>  Labels: csv, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Originally mentioned in: [https://github.com/apache/arrow/issues/6243]
> High level description:
>  * As of [this 
> commit|https://github.com/apache/arrow/commit/df54da211448b5202aa08ed2b245eb78cfd1e50c]
>  support to supply a Schema to ConvertOptions in the csv module module was 
> added (I'll add, extremely useful!). Marked as affected in at least 0.12.0 
> based on the commit history, as well as 0.15.1 (I cannot verify anything 
> between but would assume it is true over the whole version range).
>  * As of 0.15.1 the [published 
> documentation|https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions]
>  only explains that a dictionary from field name to DataType can be supplied.
> Minimal reproduction: N/A, see link.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9279) [C++] Implement PrettyPrint for Scalars

2020-06-30 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-9279:
--

 Summary: [C++] Implement PrettyPrint for Scalars
 Key: ARROW-9279
 URL: https://issues.apache.org/jira/browse/ARROW-9279
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs
 Fix For: 2.0.0


It would be useful, especially for nested scalar objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9220) [C++] Disable relevant compute kernels if ARROW_WITH_UTF8PROC=OFF

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9220.
---
Resolution: Fixed

Issue resolved by pull request 7592
[https://github.com/apache/arrow/pull/7592]

> [C++] Disable relevant compute kernels if ARROW_WITH_UTF8PROC=OFF
> -
>
> Key: ARROW-9220
> URL: https://issues.apache.org/jira/browse/ARROW-9220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> utf8proc should not be a hard dependency of ARROW_COMPUTE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9277) Fix documentation of Reading CSV files

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9277.
---
Resolution: Fixed

Issue resolved by pull request 7590
[https://github.com/apache/arrow/pull/7590]

> Fix documentation of Reading CSV files
> --
>
> Key: ARROW-9277
> URL: https://issues.apache.org/jira/browse/ARROW-9277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
> Environment: Ubuntu 18.04
>Reporter: Masaki Kozuki
>Assignee: Masaki Kozuki
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
> files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage]
>  but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1846) [C++] Implement "any" reduction kernel for boolean data

2020-06-30 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148736#comment-17148736
 ] 

Antoine Pitrou commented on ARROW-1846:
---

Isn't this the same thing as Max for booleans?

> [C++] Implement "any" reduction kernel for boolean data
> ---
>
> Key: ARROW-1846
> URL: https://issues.apache.org/jira/browse/ARROW-1846
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics, dataframe
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9230) [FlightRPC][Python] flight.connect() doesn't pass through all arguments

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9230.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7549
[https://github.com/apache/arrow/pull/7549]

> [FlightRPC][Python] flight.connect() doesn't pass through all arguments
> ---
>
> Key: ARROW-9230
> URL: https://issues.apache.org/jira/browse/ARROW-9230
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Python
>Affects Versions: 1.0.0
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9277) [C++] Fix documentation of Reading CSV files

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-9277:

Summary: [C++] Fix documentation of Reading CSV files  (was: Fix 
documentation of Reading CSV files)

> [C++] Fix documentation of Reading CSV files
> 
>
> Key: ARROW-9277
> URL: https://issues.apache.org/jira/browse/ARROW-9277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
> Environment: Ubuntu 18.04
>Reporter: Masaki Kozuki
>Assignee: Masaki Kozuki
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I gave C++ API a try and copy-pasted [the basic usage of Reading CSV 
> files|https://arrow.apache.org/docs/cpp/csv.html?highlight=csv%20c#basic-usage]
>  but the snippet did not work for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9280) [Rust] Write statistics to Parquet files

2020-06-30 Thread Z M (Jira)

Z M created ARROW-9280:
--

 Summary: [Rust] Write statistics to Parquet files
 Key: ARROW-9280
 URL: https://issues.apache.org/jira/browse/ARROW-9280
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.17.1
Reporter: Z M


Calculate page and columns statistics in Parquet files.

Support providing precalculated statistics for batches of values (useful when 
converting from other formats like ORC)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8867) [R] Support converting POSIXlt type; named lists in general

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8867:
---
Fix Version/s: (was: 2.0.0)
   1.0.0

> [R] Support converting POSIXlt type; named lists in general
> ---
>
> Key: ARROW-8867
> URL: https://issues.apache.org/jira/browse/ARROW-8867
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> {code:r}
> f <- as.POSIXlt(Sys.time() + 1:5)
> Array$create(f)
> # Error in Array__from_vector(x, type) : 
> #   Unknown: List vector expecting elements vector of type double but got 
> int32
> {code}
> Issue #1: POSIXlt type is a struct, essentially. But because it is not a 
> data.frame, we don't try to convert it to a struct. (We should probably 
> convert named lists to structs and not list type in general.)
> If I trick the converter into thinking it is a data.frame, it will convert to 
> struct successfully.
> {code:r}
> class(f) <- c(class(f), "data.frame")
> Array$create(f)
> # StructArray
> #  year: int32, wday: int32, yday: int32, isdst: int32, zone: string, gmtoff: 
> int32>>
> # ...
> {code}
> Issue #2: round trip won't work because the attributes that tell you that 
> this struct is a POSIXlt type, what time zone it is, etc., are dropped. This 
> would be helped by storing those attributes as custom_metadata on the Table. 
> (We could also implement it as an extension type, but if it's just for going 
> back and forth between R, would that have any benefit?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8867) [R] Support converting POSIXlt type; named lists in general

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-8867:
--

Assignee: Neal Richardson  (was: Romain Francois)

> [R] Support converting POSIXlt type; named lists in general
> ---
>
> Key: ARROW-8867
> URL: https://issues.apache.org/jira/browse/ARROW-8867
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 2.0.0
>
>
> {code:r}
> f <- as.POSIXlt(Sys.time() + 1:5)
> Array$create(f)
> # Error in Array__from_vector(x, type) : 
> #   Unknown: List vector expecting elements vector of type double but got 
> int32
> {code}
> Issue #1: POSIXlt type is a struct, essentially. But because it is not a 
> data.frame, we don't try to convert it to a struct. (We should probably 
> convert named lists to structs and not list type in general.)
> If I trick the converter into thinking it is a data.frame, it will convert to 
> struct successfully.
> {code:r}
> class(f) <- c(class(f), "data.frame")
> Array$create(f)
> # StructArray
> #  year: int32, wday: int32, yday: int32, isdst: int32, zone: string, gmtoff: 
> int32>>
> # ...
> {code}
> Issue #2: round trip won't work because the attributes that tell you that 
> this struct is a POSIXlt type, what time zone it is, etc., are dropped. This 
> would be helped by storing those attributes as custom_metadata on the Table. 
> (We could also implement it as an extension type, but if it's just for going 
> back and forth between R, would that have any benefit?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-4390) [R] Serialize "labeled" metadata in Feather files, IPC messages

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-4390:
--

Assignee: Neal Richardson  (was: Romain Francois)

> [R] Serialize "labeled" metadata in Feather files, IPC messages
> ---
>
> Key: ARROW-4390
> URL: https://issues.apache.org/jira/browse/ARROW-4390
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> see https://github.com/apache/arrow/issues/3480



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Slim Bentami (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148759#comment-17148759
 ] 

Slim Bentami commented on ARROW-9219:
-

Hi Neil, can you help and point me in the right direction? thanks in advance.

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148763#comment-17148763
 ] 

Neal Richardson commented on ARROW-9219:


You'll have to provide more detail on what exactly you did and what exactly 
happened. If you're on macOS as you report, install_arrow should install a 
binary package, so you won't be compiling from source and failing to find 
headers.

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9281) [R] Turn off utf8proc in R builds

2020-06-30 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-9281:
--

 Summary: [R] Turn off utf8proc in R builds
 Key: ARROW-9281
 URL: https://issues.apache.org/jira/browse/ARROW-9281
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


ARROW-9220 unfortunately stopped short of doing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8899) [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8899.

Resolution: Fixed

Issue resolved by pull request 7524
[https://github.com/apache/arrow/pull/7524]

> [R] Add R metadata like pandas metadata for round-trip fidelity
> ---
>
> Key: ARROW-8899
> URL: https://issues.apache.org/jira/browse/ARROW-8899
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain Francois
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Arrow Schema and Field objects have custom_metadata fields to store arbitrary 
> strings in a key-value store. Pandas stores JSON in a "pandas" key and uses 
> that to improve the fidelity of round-tripping data to Arrow/Parquet/Feather 
> and back. 
> https://pandas.pydata.org/docs/dev/development/developer.html#storing-pandas-dataframe-objects-in-apache-parquet-format
>  describes this a bit.
> You can see this pandas metadata in the sample Parquet file:
> {code:r}
> tab <- read_parquet(system.file("v0.7.1.parquet", package="arrow"), 
> as_data_frame = FALSE)
> tab
> # Table
> # 10 rows x 11 columns
> # $carat 
> # $cut 
> # $color 
> # $clarity 
> # $depth 
> # $table 
> # $price 
> # $x 
> # $y 
> # $z 
> # $__index_level_0__ 
> tab$metadata
> # $pandas
> # [1] "{\"index_columns\": [\"__index_level_0__\"], \"column_indexes\": 
> [{\"name\": null, \"pandas_type\": \"string\", \"numpy_type\": \"object\", 
> \"metadata\": null}], \"columns\": [{\"name\": \"carat\", \"pandas_type\": 
> \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": 
> \"cut\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", 
> \"metadata\": null}, {\"name\": \"color\", \"pandas_type\": \"unicode\", 
> \"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"clarity\", 
> \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": 
> null}, {\"name\": \"depth\", \"pandas_type\": \"float64\", \"numpy_type\": 
> \"float64\", \"metadata\": null}, {\"name\": \"table\", \"pandas_type\": 
> \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": 
> \"price\", \"pandas_type\": \"int64\", \"numpy_type\": \"int64\", 
> \"metadata\": null}, {\"name\": \"x\", \"pandas_type\": \"float64\", 
> \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"y\", 
> \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": 
> null}, {\"name\": \"z\", \"pandas_type\": \"float64\", \"numpy_type\": 
> \"float64\", \"metadata\": null}, {\"name\": \"__index_level_0__\", 
> \"pandas_type\": \"int64\", \"numpy_type\": \"int64\", \"metadata\": null}], 
> \"pandas_version\": \"0.20.1\"}"
> {code}
> We should do something similar in R: store the "attributes" for each column 
> in a data.frame when we convert to Arrow, and restore those attributes when 
> we read from Arrow. 
> Since ARROW-8703, you could naively do this all in R, something like:
> {code:r}
> tab$metadata$r <- lapply(df, attributes)
> {code}
> on the conversion to Arrow, and in as.data.frame(), do
> {code:r}
> if (!is.null(tab$metadata$r)) {
>   df[] <- mapply(function(col, meta) {
> attributes(col) <- meta
>   }, col = df, meta = tab$metadata$r)
> }
> {code}
> However, it's trickier than this because:
> * {{tab$metadata$r}} needs to be serialized to string and deserialized on the 
> way back. Pandas uses JSON but arrow doesn't currently have a JSON R 
> dependency. We could {{dput()}} to dump the R attributes, but that could 
> introduce risks since you have to parse/eval code to consume it. My best idea 
> at the moment is to try {{rawToChar(serialize(x, ascii = TRUE))}} on the way 
> out (ascii = TRUE doesn't mean it requires ASCII inputs, it's about how it 
> serializes) and {{unserialize(charToRaw(x))}} on the way back. But maybe 
> there's some lower-level way to do this better.
> * We'll need to do the same for all places where Tables and RecordBatches are 
> created/converted
> * We'll need to make sure that nested types (structs) get the same coverage
> * This metadata only is attached to Schemas, meaning that 
> Arrays/ChunkedArrays don't have a place to store extra metadata. So we 
> probably want to attach to the R6 (Chunked)Array objects a 
> metadata/attributes field so that if we convert an R vector to array, or if 
> we extract an array out of a record batch, we don't lose the attributes.
> Doing this should resolve ARROW-4390 and make ARROW-8867 trivial as well.
> Finally, a note about this custom metadata vs. extension types. Extension 
> types can be defined by [adding metadata to a 
> Field|https://arrow.apache.org/doc

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Slim Bentami (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148771#comment-17148771
 ] 

Slim Bentami commented on ARROW-9219:
-

i'm on macOS, i ran install_arrow(nightly = TRUE):

install_arrow(nightly = TRUE)

  There is a binary version available but the source version is later:
   binary  source needs_compilation
arrow 0.17.1.20200628 0.17.1.20200629  TRUE

Do you want to install from sources the package which needs compilation? 
(Yes/no/cancel) yes
installing the source package ‘arrow’

trying URL 
'https://dl.bintray.com/ursalabs/arrow-r/src/contrib/arrow_0.17.1.20200629.tar.gz'
Content type 'application/x-www-form-urlencoded' length 207120 bytes (202 KB)
==
downloaded 202 KB

* installing *source* package ‘arrow’ ...
** using staged installation
*** Downloading apache-arrow
 Using local manifest for apache-arrow
Tue Jun 30 11:05:36 EDT 2020: Auto-brewing apache-arrow in 
/var/folders/05/526nnc9d46x10750s8cxypx0gn/T//build-apache-arrow...
==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
Tapped 2 commands and 4639 formulae (4,887 files, 12.7MB).
lz4
snappy
openssl
thrift
==> Downloading 
https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
Already downloaded: 
/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/Cellar/lz4/1.8.3:
 22 files, 512.7KB
==> Downloading 
https://homebrew.bintray.com/bottles/snappy-1.1.7_1.mojave.bottle.tar.gz
Already downloaded: 
/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/downloads/1f09938804055499d1dd951b13b26d80c56eae359aa051284bf4f51d109a9f73--snappy-1.1.7_1.mojave.bottle.tar.gz
==> Pouring snappy-1.1.7_1.mojave.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/Cellar/snappy/1.1.7_1:
 18 files, 115.8KB
==> Downloading 
https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
Already downloaded: 
/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
==> Skipping post_install step for autobrew...
==> Caveats
openssl is keg-only, which means it was not symlinked into 
/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow,
because Apple has deprecated use of OpenSSL in favor of its own TLS and crypto 
libraries.

If you need to have openssl first in your PATH run:
  echo 'export 
PATH="/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
 >> ~/.bash_profile

For compilers to find openssl you may need to set:
  export 
LDFLAGS="-L/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/opt/openssl/lib"
  export 
CPPFLAGS="-I/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/opt/openssl/include"

==> Summary
🍺  
/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
 1,793 files, 12MB
==> Downloading 
https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
Already downloaded: 
/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/downloads/7e05ea11a9f7f924dd7f8f36252ec73a24958b7f214f71e3752a355e75e589bd--thrift-0.11.0.mojave.bottle.tar.gz
==> Pouring thrift-0.11.0.mojave.bottle.tar.gz
==> Skipping post_install step for autobrew...
==> Caveats
To install Ruby binding:
  gem install thrift
==> Summary
🍺  
/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/Cellar/thrift/0.11.0:
 102 files, 7MB
==> Caveats
==> openssl
openssl is keg-only, which means it was not symlinked into 
/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow,
because Apple has deprecated use of OpenSSL in favor of its own TLS and crypto 
libraries.

If you need to have openssl first in your PATH run:
  echo 'export 
PATH="/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
 >> ~/.bash_profile

For compilers to find openssl you may need to set:
  export 
LDFLAGS="-L/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/opt/openssl/lib"
  export 
CPPFLAGS="-I/private/var/folders/05/526nnc9d46x10750s8cxypx0gn/T/build-apache-arrow/opt/openssl/include"

==> thrift
To install Ruby binding:
  gem install thrift
Error: The following flags:
  --HEAD, --build-from-source
require building tools, but none are installed.
Install the Command Line Tools:
  xcode-se

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148772#comment-17148772
 ] 

Neal Richardson commented on ARROW-9219:


{quote}Do you want to install from sources the package which needs compilation? 
(Yes/no/cancel) yes{quote}

The correct answer is "no".

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Slim Bentami (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148779#comment-17148779
 ] 

Slim Bentami commented on ARROW-9219:
-

ok. i now get this:

install_arrow(nightly = TRUE)

  There is a binary version available but the source version is later:
   binary  source needs_compilation
arrow 0.17.1.20200628 0.17.1.20200629  TRUE

Do you want to install from sources the package which needs compilation? 
(Yes/no/cancel) no
trying URL 
'https://dl.bintray.com/ursalabs/arrow-r/bin/macosx/contrib/4.0/arrow_0.17.1.20200628.tgz'
Content type 'application/x-www-form-urlencoded' length 6175613 bytes (5.9 MB)
==
downloaded 5.9 MB


The downloaded binary packages are in

/var/folders/05/526nnc9d46x10750s8cxypx0gn/T//Rtmp8hjeIC/downloaded_packages
Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = 
DLLpath, ...):
 unable to load shared object 
'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/arrow/libs/arrow.so':
  
dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/arrow/libs/arrow.so,
 6): Symbol not found: _EXTPTR_PTR
  Referenced from: 
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/arrow/libs/arrow.so
  Expected in: /Library/Frameworks/R.framework/Resources/lib/libR.dylib
 in 
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/arrow/libs/arrow.so
Warning message:
package ‘arrow’ was built under R version 4.0.2

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9282) [R] Remove usage of _EXTPTR_PTR

2020-06-30 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-9282:
--

 Summary: [R] Remove usage of _EXTPTR_PTR
 Key: ARROW-9282
 URL: https://issues.apache.org/jira/browse/ARROW-9282
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See comments at the end of ARROW-9219. There was an ABI change in R, apparently 
an interface that was not officially supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148792#comment-17148792
 ] 

Neal Richardson commented on ARROW-9219:


Wow, you've hit a bleeding edge issue there: 

* [https://r.789695.n4.nabble.com/Possible-ABI-change-in-R-4-0-1-td4764335.html]

* [https://github.com/RcppCore/Rcpp/issues/1097]

It looks like the immediate solution is for you to upgrade your R from 4.0.0 to 
4.0.2. I've made ARROW-9282 to remove the _EXTPTR_PTR usages in our code but 
that might not eliminate this error until Rcpp is fixed since it also has calls 
to it.

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Slim Bentami (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148798#comment-17148798
 ] 

Slim Bentami commented on ARROW-9219:
-

Thank you Neil,
i did upgrade to 4.0.2 and the install worked and the write_parquet function 
worked as well with the desired options. 
however i have a new problem: the timestamp now looks like this: 2020-06-03 
18:00:00.000
when i was hoping for this: 159120720
(i am looking at it using dremio)

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9281) [R] Turn off utf8proc in R builds

2020-06-30 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148800#comment-17148800
 ] 

Antoine Pitrou commented on ARROW-9281:
---

What do you mean? It's just waiting for RTools 4.0 to refresh its packages.

> [R] Turn off utf8proc in R builds
> -
>
> Key: ARROW-9281
> URL: https://issues.apache.org/jira/browse/ARROW-9281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> ARROW-9220 unfortunately stopped short of doing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9163) [C++] Add methods to StringArray, LargeStringArray, to validate whether its values are all UTF-8

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-9163:
--
Fix Version/s: 1.0.0

> [C++] Add methods to StringArray, LargeStringArray, to validate whether its 
> values are all UTF-8
> 
>
> Key: ARROW-9163
> URL: https://issues.apache.org/jira/browse/ARROW-9163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> This would be useful to check in instances where it matters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8190) [C++][Flight] Allow setting IpcWriteOptions and IpcReadOptions in Flight IPC message reader and writer classes

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-8190.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7582
[https://github.com/apache/arrow/pull/7582]

> [C++][Flight] Allow setting IpcWriteOptions and IpcReadOptions in Flight IPC 
> message reader and writer classes
> --
>
> Key: ARROW-8190
> URL: https://issues.apache.org/jira/browse/ARROW-8190
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Follow up work to ARROW-7979



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6521) [C++] Add function to arrow:: namespace that returns the current ABI version

2020-06-30 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6521.
---
Resolution: Fixed

Issue resolved by pull request 7581
[https://github.com/apache/arrow/pull/7581]

> [C++] Add function to arrow:: namespace that returns the current ABI version
> 
>
> Key: ARROW-6521
> URL: https://issues.apache.org/jira/browse/ARROW-6521
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9283) [Python] Expose C++ build info

2020-06-30 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-9283:
-

 Summary: [Python] Expose C++ build info
 Key: ARROW-9283
 URL: https://issues.apache.org/jira/browse/ARROW-9283
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0


Followup to ARROW-6521 on the C++ side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8837) [Rust] Add Null type

2020-06-30 Thread Neville Dipale (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale reassigned ARROW-8837:
-

Assignee: Neville Dipale

> [Rust] Add Null type
> 
>
> Key: ARROW-8837
> URL: https://issues.apache.org/jira/browse/ARROW-8837
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {code:java}
> thread 'main' panicked at 'not implemented: Type Null not supported', 
> arrow/src/ipc/convert.rs:316:14
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9281) [R] Turn off utf8proc in R builds

2020-06-30 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148827#comment-17148827
 ] 

Neal Richardson commented on ARROW-9281:


That's not the only issue, and anyway R isn't using the new kernels yet, so 
it's wasteful to include them.

> [R] Turn off utf8proc in R builds
> -
>
> Key: ARROW-9281
> URL: https://issues.apache.org/jira/browse/ARROW-9281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> ARROW-9220 unfortunately stopped short of doing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9219) [R] coerce_timestamps in Parquet write options does not work

2020-06-30 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148828#comment-17148828
 ] 

Neal Richardson commented on ARROW-9219:


You'll have to ask Dremio that.

> [R] coerce_timestamps in Parquet write options does not work
> 
>
> Key: ARROW-9219
> URL: https://issues.apache.org/jira/browse/ARROW-9219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.1
> Environment: macOS 10.15.5, R  4.0.0
>Reporter: Slim Bentami
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> i am trying to truncate timestamps to milliseconds when writing a parquet 
> file.
> with:
>  
> {{tutu <- as.POSIXct("2020/06/03 18:00:00",tz = "UTC")}}
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet")}}
> i get 15912072
> if i do:
>  
> {{write_parquet(data.frame(tutu),"~/Downloads/tutu.test.parquet", 
> coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)}}
> i get the error message:
>  
> {{Error in parquet___ArrowWriterProperties___Builder__coerce_timestamps(unit) 
> : 
>   argument "unit" is missing, with no default}}
> what am i doing wrong? thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6235) [R] Conversion from arrow::BinaryArray to R character vector not implemented

2020-06-30 Thread Francois Saint-Jacques (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6235:
-

Assignee: Romain Francois  (was: Francois Saint-Jacques)

> [R] Conversion from arrow::BinaryArray to R character vector not implemented
> 
>
> Key: ARROW-6235
> URL: https://issues.apache.org/jira/browse/ARROW-6235
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Assignee: Romain Francois
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> See unhandled case at 
> https://github.com/apache/arrow/blob/master/r/src/array_to_vector.cpp#L644



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9284) [Java] getMinorTypeForArrowType returns sparse minor type for dense union types

2020-06-30 Thread David Li (Jira)

David Li created ARROW-9284:
---

 Summary: [Java] getMinorTypeForArrowType returns sparse minor type 
for dense union types
 Key: ARROW-9284
 URL: https://issues.apache.org/jira/browse/ARROW-9284
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-3520) [C++] Implement List Flatten kernel

2020-06-30 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3520.
-
Fix Version/s: (was: 2.0.0)
   1.0.0
   Resolution: Fixed

Issue resolved by pull request 7585
[https://github.com/apache/arrow/pull/7585]

> [C++] Implement List Flatten kernel
> ---
>
> Key: ARROW-3520
> URL: https://issues.apache.org/jira/browse/ARROW-3520
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> see also ARROW-45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-4390) [R] Serialize "labeled" metadata in Feather files, IPC messages

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-4390.

Resolution: Fixed

Issue resolved by pull request 7600
[https://github.com/apache/arrow/pull/7600]

> [R] Serialize "labeled" metadata in Feather files, IPC messages
> ---
>
> Key: ARROW-4390
> URL: https://issues.apache.org/jira/browse/ARROW-4390
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> see https://github.com/apache/arrow/issues/3480



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-4390) [R] Serialize "labeled" metadata in Feather files, IPC messages

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4390:
--
Labels: pull-request-available  (was: )

> [R] Serialize "labeled" metadata in Feather files, IPC messages
> ---
>
> Key: ARROW-4390
> URL: https://issues.apache.org/jira/browse/ARROW-4390
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> see https://github.com/apache/arrow/issues/3480



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9281) [R] Turn off utf8proc in R builds

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-9281.

Resolution: Fixed

Issue resolved by pull request 7595
[https://github.com/apache/arrow/pull/7595]

> [R] Turn off utf8proc in R builds
> -
>
> Key: ARROW-9281
> URL: https://issues.apache.org/jira/browse/ARROW-9281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> ARROW-9220 unfortunately stopped short of doing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9281) [R] Turn off utf8proc in R builds

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9281:
--
Labels: pull-request-available  (was: )

> [R] Turn off utf8proc in R builds
> -
>
> Key: ARROW-9281
> URL: https://issues.apache.org/jira/browse/ARROW-9281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-9220 unfortunately stopped short of doing this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9282) [R] Remove usage of _EXTPTR_PTR

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-9282.

Resolution: Fixed

Issue resolved by pull request 7597
[https://github.com/apache/arrow/pull/7597]

> [R] Remove usage of _EXTPTR_PTR
> ---
>
> Key: ARROW-9282
> URL: https://issues.apache.org/jira/browse/ARROW-9282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Critical
> Fix For: 1.0.0
>
>
> See comments at the end of ARROW-9219. There was an ABI change in R, 
> apparently an interface that was not officially supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9282) [R] Remove usage of _EXTPTR_PTR

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9282:
--
Labels: pull-request-available  (was: )

> [R] Remove usage of _EXTPTR_PTR
> ---
>
> Key: ARROW-9282
> URL: https://issues.apache.org/jira/browse/ARROW-9282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See comments at the end of ARROW-9219. There was an ABI change in R, 
> apparently an interface that was not officially supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9285) [C++] Detect unauthorized memory allocations in function kernels

2020-06-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9285:
---

 Summary: [C++] Detect unauthorized memory allocations in function 
kernels
 Key: ARROW-9285
 URL: https://issues.apache.org/jira/browse/ARROW-9285
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


If a function has been configured to preallocate space, then executing the 
kernel should not replace the preallocated buffer during execution -- this is 
an implementation error. Detecting this would be relatively easy and improve 
debugging for kernel implementers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8867) [R] Support converting POSIXlt type

2020-06-30 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8867:
---
Summary: [R] Support converting POSIXlt type  (was: [R] Support converting 
POSIXlt type; named lists in general)

> [R] Support converting POSIXlt type
> ---
>
> Key: ARROW-8867
> URL: https://issues.apache.org/jira/browse/ARROW-8867
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> {code:r}
> f <- as.POSIXlt(Sys.time() + 1:5)
> Array$create(f)
> # Error in Array__from_vector(x, type) : 
> #   Unknown: List vector expecting elements vector of type double but got 
> int32
> {code}
> Issue #1: POSIXlt type is a struct, essentially. But because it is not a 
> data.frame, we don't try to convert it to a struct. (We should probably 
> convert named lists to structs and not list type in general.)
> If I trick the converter into thinking it is a data.frame, it will convert to 
> struct successfully.
> {code:r}
> class(f) <- c(class(f), "data.frame")
> Array$create(f)
> # StructArray
> #  year: int32, wday: int32, yday: int32, isdst: int32, zone: string, gmtoff: 
> int32>>
> # ...
> {code}
> Issue #2: round trip won't work because the attributes that tell you that 
> this struct is a POSIXlt type, what time zone it is, etc., are dropped. This 
> would be helped by storing those attributes as custom_metadata on the Table. 
> (We could also implement it as an extension type, but if it's just for going 
> back and forth between R, would that have any benefit?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8867) [R] Support converting POSIXlt type

2020-06-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8867:
--
Labels: pull-request-available  (was: )

> [R] Support converting POSIXlt type
> ---
>
> Key: ARROW-8867
> URL: https://issues.apache.org/jira/browse/ARROW-8867
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:r}
> f <- as.POSIXlt(Sys.time() + 1:5)
> Array$create(f)
> # Error in Array__from_vector(x, type) : 
> #   Unknown: List vector expecting elements vector of type double but got 
> int32
> {code}
> Issue #1: POSIXlt type is a struct, essentially. But because it is not a 
> data.frame, we don't try to convert it to a struct. (We should probably 
> convert named lists to structs and not list type in general.)
> If I trick the converter into thinking it is a data.frame, it will convert to 
> struct successfully.
> {code:r}
> class(f) <- c(class(f), "data.frame")
> Array$create(f)
> # StructArray
> #  year: int32, wday: int32, yday: int32, isdst: int32, zone: string, gmtoff: 
> int32>>
> # ...
> {code}
> Issue #2: round trip won't work because the attributes that tell you that 
> this struct is a POSIXlt type, what time zone it is, etc., are dropped. This 
> would be helped by storing those attributes as custom_metadata on the Table. 
> (We could also implement it as an extension type, but if it's just for going 
> back and forth between R, would that have any benefit?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 111 matches

Mail list logo