[jira] [Commented] (ARROW-5580) [C++][Gandiva] Correct definitions of timestamp functions in Gandiva

2019-09-19 Thread Prudhvi Porandla (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934089#comment-16934089
 ] 

Prudhvi Porandla commented on ARROW-5580:
-

no; removed 0.15.0 tag.

> [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
> 
>
> Key: ARROW-5580
> URL: https://issues.apache.org/jira/browse/ARROW-5580
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Timestamp functions are unsupported in Gandiva due to definition mismatch.
> For example, Gandiva supports timestampAddMonth(timestamp, int32) but the 
> expected signature is  timestampAddMonth(int32, timestamp).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5580) [C++][Gandiva] Correct definitions of timestamp functions in Gandiva

2019-09-19 Thread Prudhvi Porandla (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prudhvi Porandla updated ARROW-5580:

Fix Version/s: (was: 0.15.0)

> [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
> 
>
> Key: ARROW-5580
> URL: https://issues.apache.org/jira/browse/ARROW-5580
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Timestamp functions are unsupported in Gandiva due to definition mismatch.
> For example, Gandiva supports timestampAddMonth(timestamp, int32) but the 
> expected signature is  timestampAddMonth(int32, timestamp).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6641) Remove Deprecated WriteableFile warning

2019-09-19 Thread Karthikeyan Natarajan (Jira)
Karthikeyan Natarajan created ARROW-6641:


 Summary: Remove Deprecated WriteableFile warning
 Key: ARROW-6641
 URL: https://issues.apache.org/jira/browse/ARROW-6641
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.14.1, 0.14.0
Reporter: Karthikeyan Natarajan


Current version is 0.14.1. As per comment, deprecated `WriteableFile` should 
have been removed. 

 
{code:java}
// TODO(kszucs): remove this after 0.13
#ifndef _MSC_VER
using WriteableFile ARROW_DEPRECATED("Use WritableFile") = WritableFile;
using ReadableFileInterface ARROW_DEPRECATED("Use RandomAccessFile") = 
RandomAccessFile;
#else
// MSVC does not like using ARROW_DEPRECATED with using declarations
using WriteableFile = WritableFile;
using ReadableFileInterface = RandomAccessFile;
#endif
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6640) [C++]Error when BufferedInputStream Peek more than bytes buffered

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934039#comment-16934039
 ] 

Wes McKinney commented on ARROW-6640:
-

Good catch, thanks

> [C++]Error when BufferedInputStream Peek more than bytes buffered
> -
>
> Key: ARROW-6640
> URL: https://issues.apache.org/jira/browse/ARROW-6640
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Zherui Cao
>Assignee: Zherui Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> An example:
> BufferedInputStream:Peek(10), but only 8 buffered remaining (buffer_pos is 2 
> right now)
> it will increase the buffer size by 2. In the mean time the buffer_pos will 
> be reset to 0, but it should remain 2.
> Resetting buffer_pos will cause problems.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6640) [C++]Error when BufferedInputStream Peek more than bytes buffered

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6640:

Fix Version/s: 0.15.0

> [C++]Error when BufferedInputStream Peek more than bytes buffered
> -
>
> Key: ARROW-6640
> URL: https://issues.apache.org/jira/browse/ARROW-6640
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Zherui Cao
>Assignee: Zherui Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> An example:
> BufferedInputStream:Peek(10), but only 8 buffered remaining (buffer_pos is 2 
> right now)
> it will increase the buffer size by 2. In the mean time the buffer_pos will 
> be reset to 0, but it should remain 2.
> Resetting buffer_pos will cause problems.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6566) Implement VarChar in Scala

2019-09-19 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934037#comment-16934037
 ] 

Liya Fan commented on ARROW-6566:
-

[~tampler] In this test file you provided, there is neither a main method, nor 
a test method. 
So I guess the test method should be sdArrChunkString, which is related to 
varchar. 
However, when I ran this method as a main, no exception/error was encountered:

WR VARCHAR vector
Bytes written: 340
input field fector: org.apache.arrow.vector.VarCharVector@28dcca0c[name = 
testField, ...]
RD VARCHAR vector
ArrayBuffer([B@7a5ceedd, [B@4201c465)

> Implement VarChar in Scala
> --
>
> Key: ARROW-6566
> URL: https://issues.apache.org/jira/browse/ARROW-6566
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Java
>Affects Versions: 0.14.1
>Reporter: Boris V.Kuznetsov
>Priority: Major
>
> Hello
> I'm trying to write and read a zio.Chunk of strings, with is essentially an 
> array of strings.
> My implementation fails the test, how should I fix that ?
> [Writer|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/main/scala/zio/serdes/arrow/ArrowUtils.scala#L48]
>  code
> [Reader|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/main/scala/zio/serdes/arrow/ArrowUtils.scala#L108]
>  code
> [Test|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/test/scala/arrow/Base.scala#L115]
>  code
> Any help, links and advice are highly appreciated
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6639) [Packaging][RPM] Add support for CentOS 7 on aarch64

2019-09-19 Thread Sutou Kouhei (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sutou Kouhei updated ARROW-6639:

Summary: [Packaging][RPM] Add support for CentOS 7 on aarch64  (was: 
[Packaging] Improve i386 support with Yum task)

> [Packaging][RPM] Add support for CentOS 7 on aarch64
> 
>
> Key: ARROW-6639
> URL: https://issues.apache.org/jira/browse/ARROW-6639
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Kentaro Hayashi
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> apt:build rake task supports architecture to run [1], but it is not true
>  for yum task.
>  [1] 
> [https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/package-task.rb#L276]
> It is useful yum task also supports architecture (ex. i386) too. (even though 
> CentOS 6 i386 EOL reaches 2020/11)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6639) [Packaging] Improve i386 support with Yum task

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6639:
--
Labels: pull-request-available  (was: )

> [Packaging] Improve i386 support with Yum task
> --
>
> Key: ARROW-6639
> URL: https://issues.apache.org/jira/browse/ARROW-6639
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Kentaro Hayashi
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
>
> apt:build rake task supports architecture to run [1], but it is not true
>  for yum task.
>  [1] 
> [https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/package-task.rb#L276]
> It is useful yum task also supports architecture (ex. i386) too. (even though 
> CentOS 6 i386 EOL reaches 2020/11)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6639) [Packaging] Improve i386 support with Yum task

2019-09-19 Thread Sutou Kouhei (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sutou Kouhei reassigned ARROW-6639:
---

Assignee: Sutou Kouhei

> [Packaging] Improve i386 support with Yum task
> --
>
> Key: ARROW-6639
> URL: https://issues.apache.org/jira/browse/ARROW-6639
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Kentaro Hayashi
>Assignee: Sutou Kouhei
>Priority: Major
>
> apt:build rake task supports architecture to run [1], but it is not true
>  for yum task.
>  [1] 
> [https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/package-task.rb#L276]
> It is useful yum task also supports architecture (ex. i386) too. (even though 
> CentOS 6 i386 EOL reaches 2020/11)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6640) [C++]Error when BufferedInputStream Peek more than bytes buffered

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6640:
--
Labels: pull-request-available  (was: )

> [C++]Error when BufferedInputStream Peek more than bytes buffered
> -
>
> Key: ARROW-6640
> URL: https://issues.apache.org/jira/browse/ARROW-6640
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Zherui Cao
>Assignee: Zherui Cao
>Priority: Major
>  Labels: pull-request-available
>
> An example:
> BufferedInputStream:Peek(10), but only 8 buffered remaining (buffer_pos is 2 
> right now)
> it will increase the buffer size by 2. In the mean time the buffer_pos will 
> be reset to 0, but it should remain 2.
> Resetting buffer_pos will cause problems.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6640) [C++]Error when BufferedInputStream Peek more than bytes buffered

2019-09-19 Thread Zherui Cao (Jira)
Zherui Cao created ARROW-6640:
-

 Summary: [C++]Error when BufferedInputStream Peek more than bytes 
buffered
 Key: ARROW-6640
 URL: https://issues.apache.org/jira/browse/ARROW-6640
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Zherui Cao
Assignee: Zherui Cao


An example:

BufferedInputStream:Peek(10), but only 8 buffered remaining (buffer_pos is 2 
right now)

it will increase the buffer size by 2. In the mean time the buffer_pos will be 
reset to 0, but it should remain 2.

Resetting buffer_pos will cause problems.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6090) [Rust] [DataFusion] Implement parallel execution for hash aggregate

2019-09-19 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6090.
---
Resolution: Fixed

Issue resolved by pull request 5191
[https://github.com/apache/arrow/pull/5191]

> [Rust] [DataFusion] Implement parallel execution for hash aggregate
> ---
>
> Key: ARROW-6090
> URL: https://issues.apache.org/jira/browse/ARROW-6090
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933897#comment-16933897
 ] 

Wes McKinney commented on ARROW-1664:
-

I don't think that xarray is compatible with the Arrow columnar format. Let's 
move the discussion to the mailing list if there is something more actionable

> [Python] Support for xarray.DataArray and xarray.Dataset
> 
>
> Key: ARROW-1664
> URL: https://issues.apache.org/jira/browse/ARROW-1664
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mitar
>Priority: Minor
>
> DataArray and Dataset are efficient in-memory representations for multi 
> dimensional data. It would be great if one could share them between processes 
> using Arrow.
> http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
> http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1664.
---

> [Python] Support for xarray.DataArray and xarray.Dataset
> 
>
> Key: ARROW-1664
> URL: https://issues.apache.org/jira/browse/ARROW-1664
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mitar
>Priority: Minor
>
> DataArray and Dataset are efficient in-memory representations for multi 
> dimensional data. It would be great if one could share them between processes 
> using Arrow.
> http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
> http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5642) [Python] Upgrade OpenSSL version in manylinux1 image to 1.1.1b

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-5642.
---

> [Python] Upgrade OpenSSL version in manylinux1 image to 1.1.1b
> --
>
> Key: ARROW-5642
> URL: https://issues.apache.org/jira/browse/ARROW-5642
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>
> See discussion in 
> https://github.com/apache/arrow/pull/4594#discussion_r294778800



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-1894) [Python] Treat CPython memoryview or buffer objects equivalently to pyarrow.Buffer in pyarrow.serialize

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1894.
---

> [Python] Treat CPython memoryview or buffer objects equivalently to 
> pyarrow.Buffer in pyarrow.serialize
> ---
>
> Key: ARROW-1894
> URL: https://issues.apache.org/jira/browse/ARROW-1894
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>
> These should be treated as Buffer-like on serialize. We should consider how 
> to "box" the buffers as the appropriate kind of object (Buffer, memoryview, 
> etc.) when being deserialized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-4025) [Python] TensorFlow/PyTorch arrow ThreadPool workarounds not working in some settings

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4025.
---

> [Python] TensorFlow/PyTorch arrow ThreadPool workarounds not working in some 
> settings
> -
>
> Key: ARROW-4025
> URL: https://issues.apache.org/jira/browse/ARROW-4025
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> See the bug report in [https://github.com/ray-project/ray/issues/3520]
> I wonder if we can revisit this issue and try to get rid of the workarounds 
> we tried to deploy in the past.
> See also the discussion in [https://github.com/apache/arrow/pull/2096]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-3717) [Python] Add GCSFSWrapper for DaskFileSystem

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3717.
---

> [Python] Add GCSFSWrapper for DaskFileSystem
> 
>
> Key: ARROW-3717
> URL: https://issues.apache.org/jira/browse/ARROW-3717
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Emmett McQuinn
>Priority: Major
>  Labels: FileSystem
>
> Currently there is an S3FSWrapper that extends the DaskFileSystem object to 
> support functionality like isdir(...), isfile(...), and walk(...).
> Adding a GCSFSWrapper would enable using Google Cloud Storage for packages 
> depending on arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2237) [Python] [Plasma] Huge pages test failure

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2237.
---

> [Python] [Plasma] Huge pages test failure
> -
>
> Key: ARROW-2237
> URL: https://issues.apache.org/jira/browse/ARROW-2237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> This is a new failure here (Ubuntu 16.04, x86-64):
> {code}
> _ test_use_huge_pages 
> _
> Traceback (most recent call last):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 779, 
> in test_use_huge_pages
> create_object(plasma_client, 1)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 80, in 
> create_object
> seal=seal)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 69, in 
> create_object_with_id
> memory_buffer = client.create(object_id, data_size, metadata)
>   File "plasma.pyx", line 302, in pyarrow.plasma.PlasmaClient.create
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: /home/antoine/arrow/cpp/src/plasma/client.cc:192 
> code: PlasmaReceive(store_conn_, MessageType_PlasmaCreateReply, )
> /home/antoine/arrow/cpp/src/plasma/protocol.cc:46 code: ReadMessage(sock, 
> , buffer)
> Encountered unexpected EOF
>  Captured stderr call 
> -
> Allowing the Plasma store to use up to 0.1GB of memory.
> Starting object store with directory /mnt/hugepages and huge page support 
> enabled
> create_buffer failed to open file /mnt/hugepages/plasmapSNc0X
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2013) [Python] Add AzureDataLakeFilesystem to be used with ParquetDataset and reader/writer functions

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2013.
---

> [Python]  Add AzureDataLakeFilesystem to be used with ParquetDataset and 
> reader/writer functions
> 
>
> Key: ARROW-2013
> URL: https://issues.apache.org/jira/browse/ARROW-2013
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Nicholas Pezolano
>Priority: Minor
>  Labels: filesystem
>
> Similar to https://issues.apache.org/jira/browse/ARROW-1213, it would be 
> great to add AzureDLFileSystem as a supported filesystem in ParquetDataset.
> Example:
> {code:java}
> from azure.datalake.store import AzureDLFileSystem
> fs = AzureDLFileSystem(token=token, store_name=store_name)
> dataset = pq.ParquetDataset(file_list, filesystem=fs){code}
> Throws:
> {code:java}
> IOError: Unrecognized filesystem:  'azure.datalake.store.core.AzureDLFileSystem'>{code}
> Azures github:
> https://github.com/Azure/azure-data-lake-store-python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3717) [Python] Add GCSFSWrapper for DaskFileSystem

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933894#comment-16933894
 ] 

Wes McKinney commented on ARROW-3717:
-

It may. We should see if there are downsides to using the S3-emulation interface

> [Python] Add GCSFSWrapper for DaskFileSystem
> 
>
> Key: ARROW-3717
> URL: https://issues.apache.org/jira/browse/ARROW-3717
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Emmett McQuinn
>Priority: Major
>  Labels: FileSystem
>
> Currently there is an S3FSWrapper that extends the DaskFileSystem object to 
> support functionality like isdir(...), isfile(...), and walk(...).
> Adding a GCSFSWrapper would enable using Google Cloud Storage for packages 
> depending on arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6639) [Packaging] Improve i386 support with Yum task

2019-09-19 Thread Kentaro Hayashi (Jira)
Kentaro Hayashi created ARROW-6639:
--

 Summary: [Packaging] Improve i386 support with Yum task
 Key: ARROW-6639
 URL: https://issues.apache.org/jira/browse/ARROW-6639
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Kentaro Hayashi


apt:build rake task supports architecture to run [1], but it is not true
 for yum task.

 [1] 
[https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/package-task.rb#L276]

It is useful yum task also supports architecture (ex. i386) too. (even though 
CentOS 6 i386 EOL reaches 2020/11)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2339) [Python] Add a fast path for int hashing

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2339.
---

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Priority: Minor
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6213) [C++] tests fail for AVX512

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933893#comment-16933893
 ] 

Wes McKinney commented on ARROW-6213:
-

Ah, if you use mosh instead of ssh the latency is acceptable (I'm using it from 
California at the moment and it seems pretty okay) =) 

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5131) [Python] Add Azure Datalake Filesystem Gen1 Wrapper for pyarrow

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-5131.
---

> [Python] Add Azure Datalake Filesystem Gen1 Wrapper for pyarrow
> ---
>
> Key: ARROW-5131
> URL: https://issues.apache.org/jira/browse/ARROW-5131
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Affects Versions: 0.12.1
>Reporter: Gregory Hayes
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The current pyarrow package can only read parquet files that have been 
> written to Gen1 Azure Datalake using the fastparquet engine.  This only works 
> if the dask-adlfs package is explicitly installed and imported.  I've added a 
> method to the dask-adlfs package, found 
> [here|https://github.com/dask/dask-adlfs], and issued a PR for that change.  
> To support this capability, added an ADLFSWrapper to filesystem.py file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2719) [Python/C++] ArrowSchema not hashable

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2719.
---

> [Python/C++] ArrowSchema not hashable
> -
>
> Key: ARROW-2719
> URL: https://issues.apache.org/jira/browse/ARROW-2719
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Florian Jetter
>Priority: Minor
>  Labels: beginner
>
> The arrow schema is immutable and should provide a way of hashing itself. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5099) [Plasma][Python] Compiling Plasma TensorFlow op has Python 2 bug.

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-5099.
---

> [Plasma][Python] Compiling Plasma TensorFlow op has Python 2 bug.
> -
>
> Key: ARROW-5099
> URL: https://issues.apache.org/jira/browse/ARROW-5099
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma, Python
>Reporter: Robert Nishihara
>Priority: Minor
>
> I've seen the following error when compiling the Plasma TensorFlow op.
> TensorFlow version: 1.13.1
> Compiling Plasma TensorFlow Op...
> Traceback (most recent call last):
>   File "/ray/python/ray/experimental/sgd/test_sgd.py", line 48, in 
> all_reduce_alg=args.all_reduce_alg)
>   File "/ray/python/ray/experimental/sgd/sgd.py", line 110, in __init__
> shard_shapes = ray.get(self.workers[0].shard_shapes.remote())
>   File "/ray/python/ray/worker.py", line 2307, in get
> raise value
> ray.exceptions.RayTaskError: {color:#00cdcd}ray_worker{color} (pid=81, 
> host=629a7997c823)
> NameError: global name 'FileNotFoundError' is not defined
> {{FileNotFoundError}} doesn't exist in Python 2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5036) [Plasma][C++] Serialization tests resort to memcpy to check equality

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-5036.
---

> [Plasma][C++] Serialization tests resort to memcpy to check equality
> 
>
> Key: ARROW-5036
> URL: https://issues.apache.org/jira/browse/ARROW-5036
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> {code:bash}
> 1: 
> /tmp/arrow-0.13.0.Q4czW/apache-arrow-0.13.0/cpp/src/plasma/test/serialization_tests.cc:193:
>  Failure
> 1: Expected equality of these values:
> 1:   memcmp(_objects[object_ids[0]], _objects_return[0], 
> sizeof(PlasmaObject))
> 1: Which is: 45
> 1:   0
> 1: [  FAILED  ] PlasmaSerialization.GetReply (0 ms)
> {code}
> The source of the problem is the random_plasma_object stack allocated object. 
> As a fix, I propose that PlasmaObject implements the `operator==` method and 
> drops the memcpy equality check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2051.
---

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-4259) [Plasma] CI failure in test_plasma_tf_op

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4259.
---

> [Plasma] CI failure in test_plasma_tf_op
> 
>
> Key: ARROW-4259
> URL: https://issues.apache.org/jira/browse/ARROW-4259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: ci-failure
>
> Recently-appeared failure on master:
> https://travis-ci.org/apache/arrow/jobs/479378188#L7108



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2892) [Plasma] Implement interface to get Java arrow objects from Plasma

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2892.
---

> [Plasma] Implement interface to get Java arrow objects from Plasma
> --
>
> Key: ARROW-2892
> URL: https://issues.apache.org/jira/browse/ARROW-2892
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently we have a low level interface to access bytes stored in plasma from 
> Java, using the JNI: [https://github.com/apache/arrow/pull/2065/]
>  
> As a followup, we should implement reading (and writing) Java arrow objects 
> from plasma, if possible using zero-copy.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-1266) [Plasma] Move heap allocations to arrow memory pool

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1266.
---

> [Plasma] Move heap allocations to arrow memory pool
> ---
>
> Key: ARROW-1266
> URL: https://issues.apache.org/jira/browse/ARROW-1266
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Priority: Major
>
> At the moment we are allocating memory with std::vectors and even new in some 
> places, this should be cleaned up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6590) [C++] Do not require ARROW_JSON=ON when ARROW_IPC=ON

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933890#comment-16933890
 ] 

Wes McKinney commented on ARROW-6590:
-

I have set this to be turned on when the unit tests are being built

> [C++] Do not require ARROW_JSON=ON when ARROW_IPC=ON
> 
>
> Key: ARROW-6590
> URL: https://issues.apache.org/jira/browse/ARROW-6590
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> arrow/CMakeLists.txt currently has
> {code}
> if(ARROW_IPC AND NOT ARROW_JSON)
>   message(FATAL_ERROR "JSON support is required for Arrow IPC")
> endif()
> {code}
> Building the JSON scanner component should not be a pre-requisite of building 
> IPC support



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4405) [Docs] Docker documentation builds fail since the source directory is mounted as readonly

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933889#comment-16933889
 ] 

Wes McKinney commented on ARROW-4405:
-

AFAIK the Docker docs build is still broken

> [Docs] Docker documentation builds fail since the source directory is mounted 
> as readonly
> -
>
> Key: ARROW-4405
> URL: https://issues.apache.org/jira/browse/ARROW-4405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: docker
>
> {code:java}
> writing list of installed files to '../../build/python/record.txt'
> /
> + pushd /arrow/cpp/apidoc
> /arrow/cpp/apidoc /
> + doxygen
> error: Failed to open temporary file /arrow/cpp/apidoc/doxygen_objdb_4898.tmp
> The command "docker-compose run docs" exited with 1.{code}
> https://travis-ci.org/kszucs/crossbow/builds/485348071



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-2631) [Dart] Begin a Dart language library

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2631.
---

> [Dart] Begin a Dart language library
> 
>
> Key: ARROW-2631
> URL: https://issues.apache.org/jira/browse/ARROW-2631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Dart
> Environment: mobile
>Reporter: Gerard Webb
>Priority: Major
>  Labels: newbie
>
> as per here:
> [https://github.com/apache/arrow/issues/2066]
>  
> Dart now has FlatBuffers !! Woow.
> So lets put a basic example into Arrow to get the ball rolling
> Suggest a simple Flutter client consuming a dart and golang flatbuffers type 
> / kind.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1851) [C] Minimalist ANSI C / C99 implementation of Arrow data structures and IPC

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933888#comment-16933888
 ] 

Wes McKinney commented on ARROW-1851:
-

I still see it as an aspirational goal. Let's leave it open and maybe someone 
will pick it up

> [C] Minimalist ANSI C / C99 implementation of Arrow data structures and IPC
> ---
>
> Key: ARROW-1851
> URL: https://issues.apache.org/jira/browse/ARROW-1851
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C
>Reporter: Wes McKinney
>Priority: Major
> Attachments: text.html
>
>
> This is an umbrella tracking JIRA for creating a small self-contained C 
> implementation of Arrow. This purpose of this library would be compactness 
> and portability, for embedded settings or for FFI in languages that have a 
> harder time binding to C++. The C library could also grow wrapper support for 
> the C++ library to expose more complicated functionality where we don't 
> necessarily want multiple implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6448) [CI] Add crossbow notifications

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6448.
---
  Assignee: (was: Francois Saint-Jacques)
Resolution: Won't Fix

I think the nightly e-mail is sufficient

> [CI] Add crossbow notifications
> ---
>
> Key: ARROW-6448
> URL: https://issues.apache.org/jira/browse/ARROW-6448
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6632) [C++] Do not build with ARROW_COMPUTE=on and ARROW_DATASET=on by default

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6632:

Fix Version/s: (was: 0.15.0)
   1.0.0

> [C++] Do not build with ARROW_COMPUTE=on and ARROW_DATASET=on by default
> 
>
> Key: ARROW-6632
> URL: https://issues.apache.org/jira/browse/ARROW-6632
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In addition to being more time-consuming to build, some "core" users will not 
> need these functions, so it would be better to opt in to these



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6638) [C++] Set ARROW_JEMALLOC=off by default

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6638:
---

 Summary: [C++] Set ARROW_JEMALLOC=off by default
 Key: ARROW-6638
 URL: https://issues.apache.org/jira/browse/ARROW-6638
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Enabling jemalloc is relevant for developers and packagers, who will want to 
use this allocator to achieve much better performance. We should very clearly 
advise average users of Apache Arrow to build core libraries with jemalloc 
inside but not necessarily force its use out of the box



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6637) [C++] Zero-dependency default core build

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6637:
---

 Summary: [C++] Zero-dependency default core build
 Key: ARROW-6637
 URL: https://issues.apache.org/jira/browse/ARROW-6637
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This is a tracking JIRA for items relating to having few or no dependencies for 
minimal out-of-the-box builds



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6636) [C++] Do not build C++ command line utilities by default

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6636:
---

 Summary: [C++] Do not build C++ command line utilities by default
 Key: ARROW-6636
 URL: https://issues.apache.org/jira/browse/ARROW-6636
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This means to change {{ARROW_BUILD_UTILITIES}} to be off by default. These are 
mostly used for integration testing, so building unit or integration tests 
should toggle this on automatically. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6635) [C++] Do not require glog for default build

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6635:
---

 Summary: [C++] Do not require glog for default build
 Key: ARROW-6635
 URL: https://issues.apache.org/jira/browse/ARROW-6635
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


We should change the default for {{ARROW_USE_GLOG}} to be off



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6634) [C++] Do not require flatbuffers or flatbuffers_ep to build

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6634:
---

 Summary: [C++] Do not require flatbuffers or flatbuffers_ep to 
build
 Key: ARROW-6634
 URL: https://issues.apache.org/jira/browse/ARROW-6634
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Flatbuffers is small enough that we can vendor {{flatbuffers/flatbuffers.h}} 
and check in the compiled files to make flatbuffers_ep unneeded



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6633) [C++] Do not require double-conversion for default build

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6633:
---

 Summary: [C++] Do not require double-conversion for default build
 Key: ARROW-6633
 URL: https://issues.apache.org/jira/browse/ARROW-6633
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This library is only needed in core builds if

* ARROW_JSON=on or
* ARROW_CSV=on (option to be added) or
* ARROW_BUILD_TESTS=on 

The double conversion headers leak into 

* arrow/util/decimal.h



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6632) [C++] Do not build with ARROW_COMPUTE=on and ARROW_DATASET=on by default

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6632:
---

 Summary: [C++] Do not build with ARROW_COMPUTE=on and 
ARROW_DATASET=on by default
 Key: ARROW-6632
 URL: https://issues.apache.org/jira/browse/ARROW-6632
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0


In addition to being more time-consuming to build, some "core" users will not 
need these functions, so it would be better to opt in to these



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6631) [C++] Do not build with any compression library dependencies by default

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6631:
---

 Summary: [C++] Do not build with any compression library 
dependencies by default
 Key: ARROW-6631
 URL: https://issues.apache.org/jira/browse/ARROW-6631
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Numerous packaging scripts will have to be updated if we decide to do this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5580) [C++][Gandiva] Correct definitions of timestamp functions in Gandiva

2019-09-19 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933878#comment-16933878
 ] 

Wes McKinney commented on ARROW-5580:
-

Is this tracking to land in 0.15.0?

> [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
> 
>
> Key: ARROW-5580
> URL: https://issues.apache.org/jira/browse/ARROW-5580
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Timestamp functions are unsupported in Gandiva due to definition mismatch.
> For example, Gandiva supports timestampAddMonth(timestamp, int32) but the 
> expected signature is  timestampAddMonth(int32, timestamp).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5343) [C++] Consider using Buffer for transpose maps in DictionaryType::Unify instead of std::vector

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5343.
-
Resolution: Fixed

Issue resolved by pull request 5434
[https://github.com/apache/arrow/pull/5434]

> [C++] Consider using Buffer for transpose maps in DictionaryType::Unify 
> instead of std::vector
> --
>
> Key: ARROW-5343
> URL: https://issues.apache.org/jira/browse/ARROW-5343
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In the spirit of "track all the allocations", if dictionaries have 
> non-trivial length, we may want to account for this memory more precisely. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6509) [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6509.
-
Resolution: Fixed

Issue resolved by pull request 5417
[https://github.com/apache/arrow/pull/5417]

> [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure
> 
>
> Key: ARROW-6509
> URL: https://issues.apache.org/jira/browse/ARROW-6509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Antoine Pitrou
>Assignee: Prudhvi Porandla
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This seems to happen more or less frequently on the Python - Java build (with 
> jpype enabled).
>  See warnings and errors starting from 
> [https://travis-ci.org/apache/arrow/jobs/583069089#L6662]
>  
> Additional info:
> JVM crash happens on Ubuntu 16.04 when cpp lib is built with Mimalloc 
> allocator instead of jemalloc. Below is the stacktrace from core dump:
> {{(gdb) bt}}
>  {{#0 0x7fbb13ed3428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54}}
>  {{#1 0x7fbb13ed502a in __GI_abort () at abort.c:89}}
>  {{#2 0x7fbb131d7149 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#3 0x7fbb1338ad27 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#4 0x7fbb131e0e4f in JVM_handle_linux_signal () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#5 0x7fbb131d3e48 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#6 }}
>  {{#7 mi_page_free_list_extend (heap=0x0, page=0x7fbb133de221, 
> extend=140440661634032, stats=0x7fbae3bfac00)}}
>  \{{ at 
> /home/prudhvi/arrow/cpp-build/mimalloc_ep-prefix/src/mimalloc_ep/src/page.c:449}}
>  {{#8 0x7fbaaedff652 in _mi_segment_page_of (segment=0x7fbaaedff652 
> <_mi_segment_page_of+18>, p=0x7fbae3bfab30)}}
>  \{{ at 
> /home/prudhvi/arrow/cpp-build/mimalloc_ep-prefix/src/mimalloc_ep/include/mimalloc-internal.h:232}}
>  \{{#9 0x7fbaaedff7bb in mi_heap_malloc_zero_aligned_at 
> (heap=0x7fbaaedff652 <_mi_segment_page_of+18>, size=140440661633840, 
> alignment=140439800379296, }}
>  \{{ offset=139646092684112, zero=187) at 
> /home/prudhvi/arrow/cpp-build/mimalloc_ep-prefix/src/mimalloc_ep/src/alloc-aligned.c:31}}
>  \{{#10 0x7fbaaedff7e0 in mi_heap_malloc_zero_aligned_at 
> (heap=0x7fbab069f7a0 <_mi_heap_empty>, size=139642473343568, 
> alignment=140439774558139, }}
>  \{{ offset=140440661633872, zero=186) at 
> /home/prudhvi/arrow/cpp-build/mimalloc_ep-prefix/src/mimalloc_ep/src/alloc-aligned.c:33}}
>  {{#11 0x7fbaaee00941 in mi_option_init (desc=0x7fbaaedff652 
> <_mi_segment_page_of+18>)}}
>  \{{ at 
> /home/prudhvi/arrow/cpp-build/mimalloc_ep-prefix/src/mimalloc_ep/src/options.c:204}}
>  {{#12 0x7fbb13ed7ff8 in __run_exit_handlers (status=1, 
> listp=0x7fbb142625f8 <__exit_funcs>, 
> run_list_atexit=run_list_atexit@entry=true) at exit.c:82}}
>  {{#13 0x7fbb13ed8045 in __GI_exit (status=) at 
> exit.c:104}}
>  {{#14 0x7fbb12f76a7c in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#15 0x7fbb13391587 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#16 0x7fbb1338ede7 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#17 0x7fbb133900cf in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#18 0x7fbb133905f2 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#19 0x7fbb131d6102 in ?? () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so}}
>  {{#20 0x7fbb1386a6ba in start_thread (arg=0x7fbae3bfb700) at 
> pthread_create.c:333}}
>  {{#21 0x7fbb13fa541d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5845) [Java] Implement converter between Arrow record batches and Avro records

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5845:

Fix Version/s: (was: 0.15.0)
   1.0.0

> [Java] Implement converter between Arrow record batches and Avro records
> 
>
> Key: ARROW-5845
> URL: https://issues.apache.org/jira/browse/ARROW-5845
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
> Fix For: 1.0.0
>
>
> It would be useful for applications which need convert Avro data to Arrow 
> data.
> This is an adapter which convert data with existing API (like JDBC adapter) 
> rather than a native reader (like orc).
> We implement this function through Avro java project, receiving param like 
> Decoder/Schema/DatumReader of Avro and return VectorSchemaRoot. For each data 
> type we have a consumer class as below to get Avro data and write it into 
> vector to avoid boxing/unboxing (e.g. GenericRecord#get returns Object)
> {code:java}
> public class AvroIntConsumer implements Consumer {
> private final IntWriter writer;
> public AvroIntConsumer(IntVector vector)
> { this.writer = new IntWriterImpl(vector); }
> @Override
> public void consume(Decoder decoder) throws IOException
> { writer.writeInt(decoder.readInt()); writer.setPosition(writer.getPosition() 
> + 1); }
> {code}
> We intended to support primitive and complex types (null value represented 
> via unions type with null type), size limit and field selection could be 
> optional for users. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6618) [Python] Reading a zero-size buffer can segfault

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6618.
-
Resolution: Fixed

Issue resolved by pull request 5437
[https://github.com/apache/arrow/pull/5437]

> [Python] Reading a zero-size buffer can segfault
> 
>
> Key: ARROW-6618
> URL: https://issues.apache.org/jira/browse/ARROW-6618
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Simplest reproducible code is:
> {code}
> pa.read_message(b'')
> {code}
> which gives a segfault. 
> You can easily run into this interactively when eg by accident passing a 
> already-read buffer to it, like:
> {code}
> serialized = pa.schema([('a', pa.int64())]).serialize().to_pybytes()
> buffer = pa.BufferReader(serialized)
> pa.read_message(buffer)
> pa.read_message(buffer)
> {code}
> And for example, if you compare to {{read_schema}}, this gives an error on 
> the second time / empty buffer:
> {code}
> >>> pa.read_schema(buffer)
> >>> pa.read_schema(buffer)
> ...
> ArrowInvalid: Tried reading schema message, was null or length 0
> {code}
> I know this is not proper usage of Buffer(Reader), but since it is easy to 
> accidentally do this, we should try to protect users from this I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6336) [Python] Clarify pyarrow.serialize/deserialize docstrings viz-a-viz relationship with Arrow IPC protocol

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6336.
-
Resolution: Fixed

Issue resolved by pull request 5427
[https://github.com/apache/arrow/pull/5427]

> [Python] Clarify pyarrow.serialize/deserialize docstrings viz-a-viz 
> relationship with Arrow IPC protocol
> 
>
> Key: ARROW-6336
> URL: https://issues.apache.org/jira/browse/ARROW-6336
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Some users have been confused that these functions are equivalent in some way 
> to IPC streams. We should add language explaining in more detail what they do 
> and when to use them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6630) [Doc][C++] Document the file readers (CSV, JSON, Parquet, etc.)

2019-09-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6630:
--

 Summary: [Doc][C++] Document the file readers (CSV, JSON, Parquet, 
etc.)
 Key: ARROW-6630
 URL: https://issues.apache.org/jira/browse/ARROW-6630
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6629) [Doc][C++] Document the FileSystem API

2019-09-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6629:
--

 Summary: [Doc][C++] Document the FileSystem API
 Key: ARROW-6629
 URL: https://issues.apache.org/jira/browse/ARROW-6629
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: Neal Richardson
 Fix For: 1.0.0


In ARROW-6622, I was looking for a place in the docs to add about path 
normalization, and I couldn't find filesystem docs at all. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6628) [C++] Support dictionary unification on dictionaries having nulls

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6628:
---

 Summary: [C++] Support dictionary unification on dictionaries 
having nulls
 Key: ARROW-6628
 URL: https://issues.apache.org/jira/browse/ARROW-6628
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


Follow up to ARROW-5343



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6609) [C++] Add minimal build Dockerfile example

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6609.
-
Resolution: Fixed

Issue resolved by pull request 5431
[https://github.com/apache/arrow/pull/5431]

> [C++] Add minimal build Dockerfile example
> --
>
> Key: ARROW-6609
> URL: https://issues.apache.org/jira/browse/ARROW-6609
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This will also help developers test a minimal build configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6618) [Python] Reading a zero-size buffer can segfault

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6618:

Fix Version/s: (was: 1.0.0)
   0.15.0

> [Python] Reading a zero-size buffer can segfault
> 
>
> Key: ARROW-6618
> URL: https://issues.apache.org/jira/browse/ARROW-6618
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Simplest reproducible code is:
> {code}
> pa.read_message(b'')
> {code}
> which gives a segfault. 
> You can easily run into this interactively when eg by accident passing a 
> already-read buffer to it, like:
> {code}
> serialized = pa.schema([('a', pa.int64())]).serialize().to_pybytes()
> buffer = pa.BufferReader(serialized)
> pa.read_message(buffer)
> pa.read_message(buffer)
> {code}
> And for example, if you compare to {{read_schema}}, this gives an error on 
> the second time / empty buffer:
> {code}
> >>> pa.read_schema(buffer)
> >>> pa.read_schema(buffer)
> ...
> ArrowInvalid: Tried reading schema message, was null or length 0
> {code}
> I know this is not proper usage of Buffer(Reader), but since it is easy to 
> accidentally do this, we should try to protect users from this I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5800) [R] Dockerize R Travis CI tests so they can be run anywhere via docker-compose

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-5800.
--
Fix Version/s: (was: 1.0.0)
   Resolution: Fixed

Docker R tests are available, running nightly and on demand via {{@ursabot 
crossbow submit docker-r}}. 

I wouldn't replace the Travis job with this, though: it's about 3x slower (20 
mins vs. >1 hour). 

> [R] Dockerize R Travis CI tests so they can be run anywhere via 
> docker-compose 
> ---
>
> Key: ARROW-5800
> URL: https://issues.apache.org/jira/browse/ARROW-5800
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6353) [Python] Allow user to select compression level in pyarrow.parquet.write_table

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6353:
--
Labels: pull-request-available  (was: )

> [Python] Allow user to select compression level in pyarrow.parquet.write_table
> --
>
> Key: ARROW-6353
> URL: https://issues.apache.org/jira/browse/ARROW-6353
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Igor Yastrebov
>Assignee: Martin Radev
>Priority: Major
>  Labels: pull-request-available
>
> This feature was introduced for C++ in 
> [ARROW-6216|https://issues.apache.org/jira/browse/ARROW-6216].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6622) [C++][R] SubTreeFileSystem path error on Windows

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6622:
--

Assignee: Neal Richardson

> [C++][R] SubTreeFileSystem path error on Windows
> 
>
> Key: ARROW-6622
> URL: https://issues.apache.org/jira/browse/ARROW-6622
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: filesystem, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On ARROW-6438, we got this error on Windows testing out the subtree:
> {code}
> > test_check("arrow")
>   -- 1. Error: SubTreeFilesystem (@test-filesystem.R#86)  
> 
>   Unknown error: Underlying filesystem returned path 
> 'C:/Users/appveyor/AppData/Local/Temp/1/RtmpqWFbxi/working_dir/Rtmp2Dfa6d/file2904934312d/DESCRIPTION',
>  which is not a subpath of 
> 'C:/Users/appveyor/AppData/Local/Temp/1\RtmpqWFbxi/working_dir\Rtmp2Dfa6d\file2904934312d/'
>   1: st_fs$GetTargetStats(c("DESCRIPTION", "test", "nope", "DESC.txt")) at 
> testthat/test-filesystem.R:86
>   2: map(fs___FileSystem__GetTargetStats_Paths(self, x), shared_ptr, class = 
> FileStats)
>   3: fs___FileSystem__GetTargetStats_Paths(self, x)
>   
>   == testthat results  
> ===
>   [ OK: 992 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 1 ]
> {code}
> Notice the mixture of forward slashes and backslashes in the paths so that 
> they don't match up. 
> I'm not sure which layer is doing the wrong thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6622) [C++][R] SubTreeFileSystem path error on Windows

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6622:
--
Labels: filesystem pull-request-available  (was: filesystem)

> [C++][R] SubTreeFileSystem path error on Windows
> 
>
> Key: ARROW-6622
> URL: https://issues.apache.org/jira/browse/ARROW-6622
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Reporter: Neal Richardson
>Priority: Major
>  Labels: filesystem, pull-request-available
> Fix For: 1.0.0
>
>
> On ARROW-6438, we got this error on Windows testing out the subtree:
> {code}
> > test_check("arrow")
>   -- 1. Error: SubTreeFilesystem (@test-filesystem.R#86)  
> 
>   Unknown error: Underlying filesystem returned path 
> 'C:/Users/appveyor/AppData/Local/Temp/1/RtmpqWFbxi/working_dir/Rtmp2Dfa6d/file2904934312d/DESCRIPTION',
>  which is not a subpath of 
> 'C:/Users/appveyor/AppData/Local/Temp/1\RtmpqWFbxi/working_dir\Rtmp2Dfa6d\file2904934312d/'
>   1: st_fs$GetTargetStats(c("DESCRIPTION", "test", "nope", "DESC.txt")) at 
> testthat/test-filesystem.R:86
>   2: map(fs___FileSystem__GetTargetStats_Paths(self, x), shared_ptr, class = 
> FileStats)
>   3: fs___FileSystem__GetTargetStats_Paths(self, x)
>   
>   == testthat results  
> ===
>   [ OK: 992 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 1 ]
> {code}
> Notice the mixture of forward slashes and backslashes in the paths so that 
> they don't match up. 
> I'm not sure which layer is doing the wrong thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6324) [C++] File system API should expand paths

2019-09-19 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933799#comment-16933799
 ] 

Neal Richardson commented on ARROW-6324:


Also windows backslashes (ARROW-6622)

> [C++] File system API should expand paths
> -
>
> Key: ARROW-6324
> URL: https://issues.apache.org/jira/browse/ARROW-6324
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Priority: Minor
>  Labels: filesystem
> Fix For: 1.0.0
>
>
> See ARROW-6323



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6627) [JS] decimalToString function in bn.ts does not handle negative decimals

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6627:
---

 Summary: [JS] decimalToString function in bn.ts does not handle 
negative decimals
 Key: ARROW-6627
 URL: https://issues.apache.org/jira/browse/ARROW-6627
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Wes McKinney


See GH issue https://github.com/apache/arrow/issues/5397



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6626) [Python] Handle "set" values as lists when converting to Arrow

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6626:
---

 Summary: [Python] Handle "set" values as lists when converting to 
Arrow
 Key: ARROW-6626
 URL: https://issues.apache.org/jira/browse/ARROW-6626
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


See current behavior

{code}
In [1]: pa.array([{1,2, 3}])
   
---
ArrowInvalid  Traceback (most recent call last)
 in 
> 1 pa.array([{1,2, 3}])

~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Could not convert {1, 2, 3} with type set: did not recognize 
Python value type when inferring an Arrow data type
In ../src/arrow/python/iterators.h, line 70, code: func(value, 
static_cast(i), _going)
In ../src/arrow/python/inference.cc, line 621, code: 
inferrer.VisitSequence(obj, mask)
In ../src/arrow/python/python_to_arrow.cc, line 1074, code: InferArrowType(seq, 
mask, options.from_pandas, _type)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6544) [R] Documentation/polishing for 0.15 release

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6544:
--
Labels: pull-request-available  (was: )

> [R] Documentation/polishing for 0.15 release
> 
>
> Key: ARROW-6544
> URL: https://issues.apache.org/jira/browse/ARROW-6544
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6617) [Crossbow] Unify the version numbers generated by crossbow and rake

2019-09-19 Thread Sutou Kouhei (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933763#comment-16933763
 ] 

Sutou Kouhei commented on ARROW-6617:
-

Ah, sorry.
I described wrong version in 
https://github.com/apache/arrow/pull/5024#issuecomment-532873336 .

deb uses "0.15.0~dev20190918" because "0.15.0~..." is smaller than "0.15.0" in 
deb version: http://man7.org/linux/man-pages/man5/deb-version.5.html
("0.15.0~dev20190918" has only "upstream-version". There are no "epoch" and 
"debian-revision". We always use "1" for "debian-revision".)

If we use smaller version for non production version, people who install non 
production version (0.15.0~dev20190918) can upgrade to production version 
(0.15.0).

rpm uses "0.15.0-0.dev20190918" for non production. rpm uses "0.15.0-1" for 
production". "0.15.0-1" is larger than "0.15.0-0".
"0.15.0-0.dev20190918" has "Version" and "Release". "0.15.0" is "Version". 
"0.dev20190918" is "Release". "-" is separator.

See also Fedora's Versioning Guidelines especially "Prelease versions": 
https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_prerelease_versions

> [Crossbow] Unify the version numbers generated by crossbow and rake
> ---
>
> Key: ARROW-6617
> URL: https://issues.apache.org/jira/browse/ARROW-6617
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Crossbow's default package version (0.14.0.dev584) and rake apt:build/rake 
> yum:build's default package version (0.15.0-dev20190918) are different. We 
> need to unify them, and prefer the latter one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6625) [Python] Allow concat_tables to null or default fill missing columns

2019-09-19 Thread Daniel Nugent (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Nugent updated ARROW-6625:
-
Summary: [Python] Allow concat_tables to null or default fill missing 
columns  (was: Allow concat_tables to null or default fill missing columns)

> [Python] Allow concat_tables to null or default fill missing columns
> 
>
> Key: ARROW-6625
> URL: https://issues.apache.org/jira/browse/ARROW-6625
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Reporter: Daniel Nugent
>Priority: Minor
>
> The concat_tables function currently requires schemas to be identical across 
> all tables to be concat'ed together. However, tables occasionally are 
> conforming on type where present, but a column will be absent.
> In this case, allowing for null filling (or default filling) would be ideal.
> I imagine this feature would be an optional parameter on the concat_tables 
> function. Presumably the argument could be either a boolean in the case of 
> blanket null filling, or a mapping type for default filling. If a user wanted 
> to default fill some columns, but null fill others, they could use a None as 
> the value (defaultdict would make it simple to provide a blanket null fill if 
> only a few default value columns were desired).
> If a mapping wasn't present, the function should probably raise an error.
> The default behavior would be the current and thus the default value of the 
> parameter should be False or None.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6625) Allow concat_tables to null or default fill missing columns

2019-09-19 Thread Daniel Nugent (Jira)
Daniel Nugent created ARROW-6625:


 Summary: Allow concat_tables to null or default fill missing 
columns
 Key: ARROW-6625
 URL: https://issues.apache.org/jira/browse/ARROW-6625
 Project: Apache Arrow
  Issue Type: Wish
  Components: Python
Reporter: Daniel Nugent


The concat_tables function currently requires schemas to be identical across 
all tables to be concat'ed together. However, tables occasionally are 
conforming on type where present, but a column will be absent.

In this case, allowing for null filling (or default filling) would be ideal.

I imagine this feature would be an optional parameter on the concat_tables 
function. Presumably the argument could be either a boolean in the case of 
blanket null filling, or a mapping type for default filling. If a user wanted 
to default fill some columns, but null fill others, they could use a None as 
the value (defaultdict would make it simple to provide a blanket null fill if 
only a few default value columns were desired).

If a mapping wasn't present, the function should probably raise an error.

The default behavior would be the current and thus the default value of the 
parameter should be False or None.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6615) [C++] Add filtering option to fs::Selector

2019-09-19 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6615:
--
Component/s: C++

> [C++] Add filtering option to fs::Selector
> --
>
> Key: ARROW-6615
> URL: https://issues.apache.org/jira/browse/ARROW-6615
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> It would convenient if Selector could support file path filtering, either via 
> a regex or globbing applied to the path.
> This is semi required for filtering file in Dataset to properly apply the 
> file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6615) [C++] Add filtering option to fs::Selector

2019-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933723#comment-16933723
 ] 

Antoine Pitrou commented on ARROW-6615:
---

If the filtering is done on the local end, then it's not very useful to 
integrate it in selector (all filesystems then have to implement it).

If it helps avoid recursing into uninteresting directories then it can help, 
but does that happen often?

> [C++] Add filtering option to fs::Selector
> --
>
> Key: ARROW-6615
> URL: https://issues.apache.org/jira/browse/ARROW-6615
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> It would convenient if Selector could support file path filtering, either via 
> a regex or globbing applied to the path.
> This is semi required for filtering file in Dataset to properly apply the 
> file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6429) [CI][Crossbow] Nightly spark integration job fails

2019-09-19 Thread Bryan Cutler (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933714#comment-16933714
 ] 

Bryan Cutler commented on ARROW-6429:
-

[~wesm] the issue with the timestamp test failures looks to be because calling 
{{to_pandas}} on a pyarrow ChunkedArray with a tz aware timestamp type removes 
the tz from the resulting dtype. The behavior before was a pyarrow Column keeps 
the tz but the pyarrow Array removes when converting to a numpy array.

With Arrow 0.14.1
{code}
In [4]: import pyarrow as pa 
   ...: a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles'))  
   ...: c = pa.Column.from_array('ts', a) 

In [5]: c.to_pandas()   
 
Out[5]: 
0   1969-12-31 16:00:00.01-08:00
Name: ts, dtype: datetime64[ns, America/Los_Angeles]

In [6]: a.to_pandas()   
 
Out[6]: array(['1970-01-01T00:00:00.01'], dtype='datetime64[us]')
{code}

With current master
{code}
>>> import pyarrow as pa
>>> a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles'))
>>> a.to_pandas()
0   1970-01-01 00:00:00.01
dtype: datetime64[ns]
{code}

After manually adding the timezone back in the series dtype (and fixing the 
Java compilation), all tests pass and the spark integration run finished. I 
wasn't able to look into why the timezone is being removed though. Should I 
open up a jira for this?


> [CI][Crossbow] Nightly spark integration job fails
> --
>
> Key: ARROW-6429
> URL: https://issues.apache.org/jira/browse/ARROW-6429
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: nightly, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See https://circleci.com/gh/ursa-labs/crossbow/2310. Either fix, skip job and 
> create followup Jira to unskip, or delete job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6494) [C++][Dataset] Implement basic PartitionScheme

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6494:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++][Dataset] Implement basic PartitionScheme
> --
>
> Key: ARROW-6494
> URL: https://issues.apache.org/jira/browse/ARROW-6494
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: dataset, pull-request-available
>
> The PartitionScheme interface parses paths and yields the partition 
> expressions which are encoded in those paths. For example, the Hive partition 
> scheme would yield {{"a"_ = 2 and "b"_ = 3}} from "a=2/b=3/*.parquet".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6624) [C++] Add SparseTensor.ToTensor() method

2019-09-19 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-6624:
-

 Summary: [C++] Add SparseTensor.ToTensor() method
 Key: ARROW-6624
 URL: https://issues.apache.org/jira/browse/ARROW-6624
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc
Assignee: Rok Mihevc


We have functionality to convert (dense) tensors to sparse tensors, but not the 
other way around. Also [see 
discussion|https://github.com/apache/arrow/pull/4446#issuecomment-503792308].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6622) [C++][R] SubTreeFileSystem path error on Windows

2019-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933706#comment-16933706
 ] 

Antoine Pitrou commented on ARROW-6622:
---

You should convert all backslashes to forward slashes when talking to the 
FileSystem API.

> [C++][R] SubTreeFileSystem path error on Windows
> 
>
> Key: ARROW-6622
> URL: https://issues.apache.org/jira/browse/ARROW-6622
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Reporter: Neal Richardson
>Priority: Major
>  Labels: filesystem
> Fix For: 1.0.0
>
>
> On ARROW-6438, we got this error on Windows testing out the subtree:
> {code}
> > test_check("arrow")
>   -- 1. Error: SubTreeFilesystem (@test-filesystem.R#86)  
> 
>   Unknown error: Underlying filesystem returned path 
> 'C:/Users/appveyor/AppData/Local/Temp/1/RtmpqWFbxi/working_dir/Rtmp2Dfa6d/file2904934312d/DESCRIPTION',
>  which is not a subpath of 
> 'C:/Users/appveyor/AppData/Local/Temp/1\RtmpqWFbxi/working_dir\Rtmp2Dfa6d\file2904934312d/'
>   1: st_fs$GetTargetStats(c("DESCRIPTION", "test", "nope", "DESC.txt")) at 
> testthat/test-filesystem.R:86
>   2: map(fs___FileSystem__GetTargetStats_Paths(self, x), shared_ptr, class = 
> FileStats)
>   3: fs___FileSystem__GetTargetStats_Paths(self, x)
>   
>   == testthat results  
> ===
>   [ OK: 992 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 1 ]
> {code}
> Notice the mixture of forward slashes and backslashes in the paths so that 
> they don't match up. 
> I'm not sure which layer is doing the wrong thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6438) [R] Add bindings for filesystem API

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6438.

Resolution: Fixed

Issue resolved by pull request 5390
[https://github.com/apache/arrow/pull/5390]

> [R] Add bindings for filesystem API
> ---
>
> Key: ARROW-6438
> URL: https://issues.apache.org/jira/browse/ARROW-6438
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> See ARROW-5494 for the Python bindings. Along with ARROW-6437, we'll be able 
> to support file-system-like operations in S3. Some of this also may be 
> necessary for the Datasets API bindings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6623) [CI][Python] Dask docker integration test broken perhaps by statistics-related change

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6623:
---

 Summary: [CI][Python] Dask docker integration test broken perhaps 
by statistics-related change
 Key: ARROW-6623
 URL: https://issues.apache.org/jira/browse/ARROW-6623
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.15.0


see new failure 

https://circleci.com/gh/ursa-labs/crossbow/3027?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link

{code}
=== FAILURES ===
___ test_timeseries_nulls_in_schema[pyarrow] ___

tmpdir = local('/tmp/pytest-of-root/pytest-0/test_timeseries_nulls_in_schem0')
engine = 'pyarrow'

def test_timeseries_nulls_in_schema(tmpdir, engine):
tmp_path = str(tmpdir)
ddf2 = (
dask.datasets.timeseries(start="2000-01-01", end="2000-01-03", 
freq="1h")
.reset_index()
.map_partitions(lambda x: x.loc[:5])
)
ddf2 = ddf2.set_index("x").reset_index().persist()
ddf2.name = ddf2.name.where(ddf2.timestamp == "2000-01-01", None)

ddf2.to_parquet(tmp_path, engine=engine)
ddf_read = dd.read_parquet(tmp_path, engine=engine)

assert_eq(ddf_read, ddf2, check_divisions=False, check_index=False)

# Can force schema validation on each partition in pyarrow
if engine == "pyarrow":
# The schema mismatch should raise an error
with pytest.raises(ValueError):
ddf_read = dd.read_parquet(
tmp_path, dataset={"validate_schema": True}, engine=engine
)
# There should be no error if you specify a schema on write
schema = pa.schema(
[
("x", pa.float64()),
("timestamp", pa.timestamp("ns")),
("id", pa.int64()),
("name", pa.string()),
("y", pa.float64()),
]
)
ddf2.to_parquet(tmp_path, schema=schema, engine=engine)
assert_eq(
>   dd.read_parquet(tmp_path, dataset={"validate_schema": True}, 
> engine=engine),
ddf2,
check_divisions=False,
check_index=False,
)

opt/conda/lib/python3.6/site-packages/dask/dataframe/io/tests/test_parquet.py:1964:
 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
opt/conda/lib/python3.6/site-packages/dask/dataframe/io/parquet/core.py:190: in 
read_parquet
out = sorted_columns(statistics)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

statistics = ({'columns': [{'max': -0.25838390663957256, 'min': 
-0.979681447427093, 'name': 'x', 'null_count': 0}, {'max': 
Timestam...ull_count': 0}, {'max': 0.8978352477516438, 'min': 
-0.7218571212693894, 'name': 'y', 'null_count': 0}], 'num-rows': 7})

def sorted_columns(statistics):
""" Find sorted columns given row-group statistics

This finds all columns that are sorted, along with appropriate divisions
values for those columns

Returns
---
out: List of {'name': str, 'divisions': List[str]} dictionaries
"""
if not statistics:
return []

out = []
for i, c in enumerate(statistics[0]["columns"]):
if not all(
"min" in s["columns"][i] and "max" in s["columns"][i] for s in 
statistics
):
continue
divisions = [c["min"]]
max = c["max"]
success = True
for stats in statistics[1:]:
c = stats["columns"][i]
>   if c["min"] >= max:
E   TypeError: '>=' not supported between instances of 
'numpy.ndarray' and 'str'

opt/conda/lib/python3.6/site-packages/dask/dataframe/io/parquet/core.py:570: 
TypeError
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6622) [C++][R] SubTreeFileSystem path error on Windows

2019-09-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6622:
--

 Summary: [C++][R] SubTreeFileSystem path error on Windows
 Key: ARROW-6622
 URL: https://issues.apache.org/jira/browse/ARROW-6622
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, R
Reporter: Neal Richardson
 Fix For: 1.0.0


On ARROW-6438, we got this error on Windows testing out the subtree:

{code}
> test_check("arrow")
  -- 1. Error: SubTreeFilesystem (@test-filesystem.R#86)  

  Unknown error: Underlying filesystem returned path 
'C:/Users/appveyor/AppData/Local/Temp/1/RtmpqWFbxi/working_dir/Rtmp2Dfa6d/file2904934312d/DESCRIPTION',
 which is not a subpath of 
'C:/Users/appveyor/AppData/Local/Temp/1\RtmpqWFbxi/working_dir\Rtmp2Dfa6d\file2904934312d/'
  1: st_fs$GetTargetStats(c("DESCRIPTION", "test", "nope", "DESC.txt")) at 
testthat/test-filesystem.R:86
  2: map(fs___FileSystem__GetTargetStats_Paths(self, x), shared_ptr, class = 
FileStats)
  3: fs___FileSystem__GetTargetStats_Paths(self, x)
  
  == testthat results  
===
  [ OK: 992 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 1 ]
{code}

Notice the mixture of forward slashes and backslashes in the paths so that they 
don't match up. 

I'm not sure which layer is doing the wrong thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6620) [Python][CI] pandas-master build failing due to removal of "to_sparse" method

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6620.
---
Resolution: Duplicate

Confirmed, thanks

> [Python][CI] pandas-master build failing due to removal of "to_sparse" method
> -
>
> Key: ARROW-6620
> URL: https://issues.apache.org/jira/browse/ARROW-6620
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> See nightly build failure
> https://circleci.com/gh/ursa-labs/crossbow/3046?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6620) [Python][CI] pandas-master build failing due to removal of "to_sparse" method

2019-09-19 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933625#comment-16933625
 ] 

Joris Van den Bossche commented on ARROW-6620:
--

[~wesm] I think this should already be covered by the PR I did this morning: 
https://github.com/apache/arrow/pull/5438

> [Python][CI] pandas-master build failing due to removal of "to_sparse" method
> -
>
> Key: ARROW-6620
> URL: https://issues.apache.org/jira/browse/ARROW-6620
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> See nightly build failure
> https://circleci.com/gh/ursa-labs/crossbow/3046?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6582) R's read_parquet() fails with embedded nuls in strings

2019-09-19 Thread John Cassil (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933622#comment-16933622
 ] 

John Cassil commented on ARROW-6582:


Thank you very much for your thoughtful responses and the solution to my 
original goal.

That might be an interesting approach.  If there were a function to export it 
from Arrow into text on disk, then a user could essentially use Arrow as a 
method to unpack/uncompress a parquet file into raw text.  I could see that 
being helpful in other situations beyond this.  I'm not sure.  Your call!

> R's read_parquet() fails with embedded nuls in strings
> --
>
> Key: ARROW-6582
> URL: https://issues.apache.org/jira/browse/ARROW-6582
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.14.1
> Environment: Windows 10
> R 3.4.4
>Reporter: John Cassil
>Priority: Major
>
> Apologies if this issue isn't categorized or documented appropriately.  
> Please be gentle! :)
> As a heavy R user that normally interacts with parquet files using SparklyR, 
> I have recently decided to try to use arrow::read_parquet() on a few parquet 
> files that were on my local machine rather than in hadoop.  I was not able to 
> proceed after several various attempts due to embedded nuls.  For example:
> try({df <- read_parquet('out_2019-09_data_1.snappy.parquet') })
> Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
>   embedded nul in string: 'INSTALL BOTH LEFT FRONT AND RIGHT FRONT  TORQUE 
> ARMS\0 ARMS'
> Is there a solution to this?
> I have also hit roadblocks with embedded nuls in the past with csvs using 
> data.table::fread(), but readr::read_delim() seems to handle them gracefully 
> with just a warning after proceeding.
> Apologies that I do not have a handy reprex. I don't know if I can even 
> recreate a parquet file with embedded nuls using arrow if it won't let me 
> read one in, and I can't share this file due to company restrictions.
> Please let me know how I can be of any more help!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6277) [C++][Parquet] Support reading/writing other Parquet primitive types to DictionaryArray

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6277:
--
Labels: pull-request-available  (was: )

> [C++][Parquet] Support reading/writing other Parquet primitive types to 
> DictionaryArray
> ---
>
> Key: ARROW-6277
> URL: https://issues.apache.org/jira/browse/ARROW-6277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> As follow up to ARROW-3246, we should support direct read/write of the other 
> Parquet primitive types. Currently only BYTE_ARRAY is implemented as it 
> provides the most performance benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6582) R's read_parquet() fails with embedded nuls in strings

2019-09-19 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933608#comment-16933608
 ] 

Neal Richardson commented on ARROW-6582:


My guess is that (if you needed to solve this problem, which it sounds like you 
don't), you could try setting different encodings in your R session and see if 
that handles the string column correctly. Or there's probably a way to get at 
that column and dump it as is to disk so that you could use some other means of 
stripping out the nuls. I'm guessing the ultimate fix is to fix the data 
generating/ETL process so that there aren't nuls there to begin with, though I 
recognize that that's not always an option.

I'll keep this open for a bit and think about if there are ways we can make it 
easier to dump that data from Arrow to a plain text format without going 
through R first so that one might be able to debug when they get bad data like 
this, but ultimately the error is coming from R, not arrow.

> R's read_parquet() fails with embedded nuls in strings
> --
>
> Key: ARROW-6582
> URL: https://issues.apache.org/jira/browse/ARROW-6582
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.14.1
> Environment: Windows 10
> R 3.4.4
>Reporter: John Cassil
>Priority: Major
>
> Apologies if this issue isn't categorized or documented appropriately.  
> Please be gentle! :)
> As a heavy R user that normally interacts with parquet files using SparklyR, 
> I have recently decided to try to use arrow::read_parquet() on a few parquet 
> files that were on my local machine rather than in hadoop.  I was not able to 
> proceed after several various attempts due to embedded nuls.  For example:
> try({df <- read_parquet('out_2019-09_data_1.snappy.parquet') })
> Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
>   embedded nul in string: 'INSTALL BOTH LEFT FRONT AND RIGHT FRONT  TORQUE 
> ARMS\0 ARMS'
> Is there a solution to this?
> I have also hit roadblocks with embedded nuls in the past with csvs using 
> data.table::fread(), but readr::read_delim() seems to handle them gracefully 
> with just a warning after proceeding.
> Apologies that I do not have a handy reprex. I don't know if I can even 
> recreate a parquet file with embedded nuls using arrow if it won't let me 
> read one in, and I can't share this file due to company restrictions.
> Please let me know how I can be of any more help!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5935) [C++] ArrayBuilders with mutable type are not robustly supported

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-5935.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 4930
[https://github.com/apache/arrow/pull/4930]

> [C++] ArrayBuilders with mutable type are not robustly supported
> 
>
> Key: ARROW-5935
> URL: https://issues.apache.org/jira/browse/ARROW-5935
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> (Dense|Sparse)UnionBuilder, DictionaryBuilder, Addaptive(U)IntBuilders and 
> any nested builder which contains one of those may Finish to an array whose 
> type disagrees with what was passed to MakeBuilder. This is not well 
> documented or supported; ListBuilder checks if its child has changed type but 
> StructBuilder does not. Furthermore ListBuilder's check does not catch 
> modifications to a DictionaryBuidler's type and results in an invalid array 
> on Finish: 
> https://github.com/apache/arrow/blob/1bcfbe1/cpp/src/arrow/array-dict-test.cc#L951-L994
> Let's add to the ArrayBuilder contract: the type property is null iff that 
> builder's type is indeterminate until Finish() is called. Then all nested 
> builders can check this on their children at construction and bubble type 
> mutability correclty



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5956) [R] Ability for R to link to C++ libraries from pyarrow Wheel

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-5956.
--
Resolution: Invalid

> [R] Ability for R to link to C++ libraries from pyarrow Wheel
> -
>
> Key: ARROW-5956
> URL: https://issues.apache.org/jira/browse/ARROW-5956
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
> Environment: Ubuntu 16.04, R 3.4.4, python 3.6.5
>Reporter: Jeffrey Wong
>Priority: Major
>
> I have installed pyarrow 0.14.0 and want to be able to also use R arrow. In 
> my work I use rpy2 a lot to exchange python data structures with R data 
> structures, so would like R arrow to link against the exact same .so files 
> found in pyarrow
>  
>  
> When I pass in include_dir and lib_dir to R's configure, pointing to 
> pyarrow's include and pyarrow's root directories, I am able to compile R's 
> arrow.so file. However, I am unable to load it in an R session, getting the 
> error:
>  
> {code:java}
> > dyn.load('arrow.so')
> Error in dyn.load("arrow.so") :
>  unable to load shared object '/tmp/arrow2/r/src/arrow.so':
>  /tmp/arrow2/r/src/arrow.so: undefined symbol: 
> _ZNK5arrow11StructArray14GetFieldByNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE{code}
>  
>  
> Steps to reproduce:
>  
> Install pyarrow, which also ships libarrow.so and libparquet.so
>  
> {code:java}
> pip3 install pyarrow --upgrade --user
> PY_ARROW_PATH=$(python3 -c "import pyarrow, os; 
> print(os.path.dirname(pyarrow.__file__))")
> PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
> ln -s $PY_ARROW_PATH/libarrow.so.14 $PY_ARROW_PATH/libarrow.so
> ln -s $PY_ARROW_PATH/libparquet.so.14 $PY_ARROW_PATH/libparquet.so
> {code}
>  
>  
> Add to LD_LIBRARY_PATH
>  
> {code:java}
> sudo tee -a /usr/lib/R/etc/ldpaths < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> sudo tee -a /usr/lib/rstudio-server/bin/r-ldpath < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> {code}
>  
>  
> Install r arrow from source
> {code:java}
> git clone https://github.com/apache/arrow.git /tmp/arrow2
> cd /tmp/arrow2/r
> git checkout tags/apache-arrow-0.14.0
> R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include 
> LIB_DIR=$PY_ARROW_PATH"{code}
>  
> I have noticed that the R package for arrow no longer has an RcppExports, but 
> instead an arrowExports. Could it be that the lack of RcppExports has made it 
> difficult to find GetFieldByName?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-19 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-6621:
--

 Summary: [Rust][DataFusion] Examples for DataFusion are not 
executed in CI
 Key: ARROW-6621
 URL: https://issues.apache.org/jira/browse/ARROW-6621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.14.1
Reporter: Paddy Horan


See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5956) [R] Ability for R to link to C++ libraries from pyarrow Wheel

2019-09-19 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933587#comment-16933587
 ] 

Neal Richardson commented on ARROW-5956:


It turns out that you can't (safely) use the pyarrow wheel in R on Linux, 
though you can on macOS. See discussion starting around here: 
https://github.com/apache/arrow/pull/5408#issuecomment-532438681)  

You can fix the dyn.load error you originally reported by setting 
{{ARROW_USE_OLD_CXXABI=1}} 
(https://github.com/apache/arrow/blob/master/r/configure#L99-L102). That lets 
the package install and load. But then any C++ error status that leads to 
{{Rcpp::stop()}} being called will cause a core dump. Our analysis led us to 
conclude that the problem is a mismatch between the standard library versions 
in the (dated) manylinux2010 wheel build and the more contemporary one used on 
the host OS and by Rcpp there. Apparently Rcpp relies on more modern C++ 
conventions than are used in Python. 

So, it seems that right now, if you want to use the same .so for Python and R 
on Linux, your options are:

1. Install the C++ library system packages and install pyarrow and the R 
package from source locally, linking to that;
2. Build everything locally;
3. Use conda

Once manylinux2014 happens, maybe the wheels will be more suitable and we can 
try again.

> [R] Ability for R to link to C++ libraries from pyarrow Wheel
> -
>
> Key: ARROW-5956
> URL: https://issues.apache.org/jira/browse/ARROW-5956
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
> Environment: Ubuntu 16.04, R 3.4.4, python 3.6.5
>Reporter: Jeffrey Wong
>Priority: Major
>
> I have installed pyarrow 0.14.0 and want to be able to also use R arrow. In 
> my work I use rpy2 a lot to exchange python data structures with R data 
> structures, so would like R arrow to link against the exact same .so files 
> found in pyarrow
>  
>  
> When I pass in include_dir and lib_dir to R's configure, pointing to 
> pyarrow's include and pyarrow's root directories, I am able to compile R's 
> arrow.so file. However, I am unable to load it in an R session, getting the 
> error:
>  
> {code:java}
> > dyn.load('arrow.so')
> Error in dyn.load("arrow.so") :
>  unable to load shared object '/tmp/arrow2/r/src/arrow.so':
>  /tmp/arrow2/r/src/arrow.so: undefined symbol: 
> _ZNK5arrow11StructArray14GetFieldByNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE{code}
>  
>  
> Steps to reproduce:
>  
> Install pyarrow, which also ships libarrow.so and libparquet.so
>  
> {code:java}
> pip3 install pyarrow --upgrade --user
> PY_ARROW_PATH=$(python3 -c "import pyarrow, os; 
> print(os.path.dirname(pyarrow.__file__))")
> PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
> ln -s $PY_ARROW_PATH/libarrow.so.14 $PY_ARROW_PATH/libarrow.so
> ln -s $PY_ARROW_PATH/libparquet.so.14 $PY_ARROW_PATH/libparquet.so
> {code}
>  
>  
> Add to LD_LIBRARY_PATH
>  
> {code:java}
> sudo tee -a /usr/lib/R/etc/ldpaths < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> sudo tee -a /usr/lib/rstudio-server/bin/r-ldpath < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> {code}
>  
>  
> Install r arrow from source
> {code:java}
> git clone https://github.com/apache/arrow.git /tmp/arrow2
> cd /tmp/arrow2/r
> git checkout tags/apache-arrow-0.14.0
> R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include 
> LIB_DIR=$PY_ARROW_PATH"{code}
>  
> I have noticed that the R package for arrow no longer has an RcppExports, but 
> instead an arrowExports. Could it be that the lack of RcppExports has made it 
> difficult to find GetFieldByName?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6353) [Python] Allow user to select compression level in pyarrow.parquet.write_table

2019-09-19 Thread Martin Radev (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933582#comment-16933582
 ] 

Martin Radev commented on ARROW-6353:
-

I started working on it.

> [Python] Allow user to select compression level in pyarrow.parquet.write_table
> --
>
> Key: ARROW-6353
> URL: https://issues.apache.org/jira/browse/ARROW-6353
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Igor Yastrebov
>Assignee: Martin Radev
>Priority: Major
>
> This feature was introduced for C++ in 
> [ARROW-6216|https://issues.apache.org/jira/browse/ARROW-6216].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6353) [Python] Allow user to select compression level in pyarrow.parquet.write_table

2019-09-19 Thread Martin Radev (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Radev reassigned ARROW-6353:
---

Assignee: Martin Radev

> [Python] Allow user to select compression level in pyarrow.parquet.write_table
> --
>
> Key: ARROW-6353
> URL: https://issues.apache.org/jira/browse/ARROW-6353
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Igor Yastrebov
>Assignee: Martin Radev
>Priority: Major
>
> This feature was introduced for C++ in 
> [ARROW-6216|https://issues.apache.org/jira/browse/ARROW-6216].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5086) [Python] Space leak in ParquetFile.read_row_group()

2019-09-19 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5086.
-
Resolution: Fixed

Issue resolved by pull request 5433
[https://github.com/apache/arrow/pull/5433]

> [Python] Space leak in  ParquetFile.read_row_group()
> 
>
> Key: ARROW-5086
> URL: https://issues.apache.org/jira/browse/ARROW-5086
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.1
>Reporter: Jakub Okoński
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.15.0
>
> Attachments: all.png, all.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I have a code pattern like this:
>  
> reader = pq.ParquetFile(path)
> for ix in range(0, reader.num_row_groups):
>     table = reader.read_row_group(ix, columns=self._columns)
>     # operate on table
>  
> But it leaks memory over time, only releasing it when the reader object is 
> collected. Here's a workaround
>  
> num_row_groups = pq.ParquetFile(path).num_row_groups
> for ix in range(0, num_row_groups):
>     table = pq.ParquetFile(path).read_row_group(ix, columns=self._columns)
>     # operate on table
>  
> This puts an upper bound on memory usage and is what I'd  expect from the 
> code. I also put gc.collect() to the end of every loop.
>  
> I charted out memory usage for a small benchmark that just copies a file, one 
> row group at a time, converting to pandas and back to arrow on the writer 
> path. Line in black is the first one, using a single reader object. Blue is 
> instantiating a fresh reader in every iteration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5216) [CI] Add Appveyor badge to README

2019-09-19 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933546#comment-16933546
 ] 

Neal Richardson commented on ARROW-5216:


Created INFRA-19101 to get the badge URL.

> [CI] Add Appveyor badge to README
> -
>
> Key: ARROW-5216
> URL: https://issues.apache.org/jira/browse/ARROW-5216
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Priority: Trivial
> Fix For: 1.0.0
>
>
> I was trying to see what was running in appveyor and couldn't find it. 
> Krisztián helped me to find 
> [https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow], but it 
> would be nice to add the badge to the README next to the Travis-CI one for a 
> quick link to it (as well as showing off build status).
> I was just going to add it myself, but unlike Travis, you can't guess the 
> Appveyor badge URL from the project name because they have a hash in them; 
> only someone with sufficient privileges on the project in Appveyor can get to 
> the settings panel to find the URL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5216) [CI] Add Appveyor badge to README

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5216:
--

Assignee: Neal Richardson

> [CI] Add Appveyor badge to README
> -
>
> Key: ARROW-5216
> URL: https://issues.apache.org/jira/browse/ARROW-5216
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Trivial
> Fix For: 1.0.0
>
>
> I was trying to see what was running in appveyor and couldn't find it. 
> Krisztián helped me to find 
> [https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow], but it 
> would be nice to add the badge to the README next to the Travis-CI one for a 
> quick link to it (as well as showing off build status).
> I was just going to add it myself, but unlike Travis, you can't guess the 
> Appveyor badge URL from the project name because they have a hash in them; 
> only someone with sufficient privileges on the project in Appveyor can get to 
> the settings panel to find the URL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6619) [Ruby] Add support for building Gandiva::Expression by Arrow::Schema#build_expression

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6619:
--
Labels: pull-request-available  (was: )

> [Ruby] Add support for building Gandiva::Expression by 
> Arrow::Schema#build_expression
> -
>
> Key: ARROW-6619
> URL: https://issues.apache.org/jira/browse/ARROW-6619
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
>
> This is the first attempt to make Red Gandiva API better.
> This adds Arrow::Schema#build_expression, which aims to build 
> Gandiva::Expression with FunctionNode or IfNode easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6620) [Python][CI] pandas-master build failing due to removal of "to_sparse" method

2019-09-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6620:
---

 Summary: [Python][CI] pandas-master build failing due to removal 
of "to_sparse" method
 Key: ARROW-6620
 URL: https://issues.apache.org/jira/browse/ARROW-6620
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.15.0


See nightly build failure

https://circleci.com/gh/ursa-labs/crossbow/3046?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6214) [R] Sanitizer errors triggered via R bindings

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6214.

Resolution: Fixed

Issue resolved by pull request 5408
[https://github.com/apache/arrow/pull/5408]

> [R] Sanitizer errors triggered via R bindings
> -
>
> Key: ARROW-6214
> URL: https://issues.apache.org/jira/browse/ARROW-6214
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 0.14.1
> Environment: Linux
>Reporter: Jeroen
>Assignee: Francois Saint-Jacques
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0
>
> Attachments: RDcsan.failures, RDsan.failures
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When we run the examples of the R package through the sanitizers, several 
> errors show up. These could be related to the segfaults we saw on the macos 
> builder on CRAN.
> We use the docker container provided by Winston Chang to test this: 
> https://github.com/wch/r-debug
> Steps to reproduce + example outputs at: 
> https://gist.github.com/jeroen/111901c351a4089a9effa90691a1dd81



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6151) [R] See if possible to generate r/inst/NOTICE.txt rather than duplicate information

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-6151.
--
Resolution: Not A Problem

> [R] See if possible to generate r/inst/NOTICE.txt rather than duplicate 
> information
> ---
>
> Key: ARROW-6151
> URL: https://issues.apache.org/jira/browse/ARROW-6151
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
>
> I noticed this file -- I am concerned about its maintainability. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6542) [R] Add View() method to array types

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6542:
--

Assignee: Neal Richardson

> [R] Add View() method to array types
> 
>
> Key: ARROW-6542
> URL: https://issues.apache.org/jira/browse/ARROW-6542
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See ARROW-6048



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6542) [R] Add View() method to array types

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6542:
--

Assignee: Romain François  (was: Neal Richardson)

> [R] Add View() method to array types
> 
>
> Key: ARROW-6542
> URL: https://issues.apache.org/jira/browse/ARROW-6542
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See ARROW-6048



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6596) [R] Getting "Cannot call io___MemoryMappedFile__Open()" error while reading a parquet file

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6596:
---
Fix Version/s: (was: 0.14.1)

> [R] Getting "Cannot call io___MemoryMappedFile__Open()" error while reading a 
> parquet file
> --
>
> Key: ARROW-6596
> URL: https://issues.apache.org/jira/browse/ARROW-6596
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.14.1
> Environment: ubuntu 18.04
>Reporter: Addhyan
>Priority: Major
>  Labels: Docker, R, arrow, parquet
>
> I am using r/Dockerfile to get all the R dependency and following back to get 
> everything to get the arrow/r work in linux (either ubuntu/debian) but it is 
> continuously giving me this error:
> Error in io___MemoryMappedFile__Open(fs::path_abs(path), mode) : 
>   Cannot call io___MemoryMappedFile__Open()
> I have installed all the required cpp libraries as mentioned here: 
> [https://arrow.apache.org/install/] under "Ubuntu 18.04 LTS or later".  I 
> have also tried to use 
> [cpp/Dockerfile|https://github.com/apache/arrow/blob/master/cpp/Dockerfile] 
> and then followed backwards without any luck. The error is consistent and 
> doesn't go away. 
> I am trying to build a docker image with dockerfile containing everything 
> that arrow needs, all the cpp libraries etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6003) [C++] Better input validation and error messaging in CSV reader

2019-09-19 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6003:
--

Assignee: (was: Neal Richardson)

> [C++] Better input validation and error messaging in CSV reader
> ---
>
> Key: ARROW-6003
> URL: https://issues.apache.org/jira/browse/ARROW-6003
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
>  Labels: csv
>
> Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error 
> message(s) are not great when you give bad input. For example, if I give too 
> many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV 
> file}}. In fact, that's about the only error message I've seen from the CSV 
> reader, no matter what I've thrown at it.
> It would be better if error messages were more specific so that I as a user 
> might know how to fix my bad input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6619) [Ruby] Add support for building Gandiva::Expression by Arrow::Schema#build_expression

2019-09-19 Thread Yosuke Shiro (Jira)
Yosuke Shiro created ARROW-6619:
---

 Summary: [Ruby] Add support for building Gandiva::Expression by 
Arrow::Schema#build_expression
 Key: ARROW-6619
 URL: https://issues.apache.org/jira/browse/ARROW-6619
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Ruby
Reporter: Yosuke Shiro
Assignee: Yosuke Shiro


This is the first attempt to make Red Gandiva API better.
This adds Arrow::Schema#build_expression, which aims to build 
Gandiva::Expression with FunctionNode or IfNode easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >