date:20190207

[jira] [Created] (ARROW-4512) [R] Stream reader/writer API that takes socket stream

2019-02-07 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created ARROW-4512:
---

 Summary: [R] Stream reader/writer API that takes socket stream
 Key: ARROW-4512
 URL: https://issues.apache.org/jira/browse/ARROW-4512
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 0.12.0
Reporter: Hyukjin Kwon


I have been working on Spark integration with Arrow.

I realised that there are no ways to use socket as input to use Arrow stream 
format. For instance,
I want to something like:

{code}
connStream <- socketConnection(port = , blocking = TRUE, open = "wb")

rdf_slices <- # a list of data frames.

stream_writer <- NULL
tryCatch({
  for (rdf_slice in rdf_slices) {
batch <- record_batch(rdf_slice)
if (is.null(stream_writer)) {
  stream_writer <- RecordBatchStreamWriter(connStream, batch$schema)  # 
Here, looks there's no way to use socket.
}

stream_writer$write_batch(batch)
  }
},
finally = {
  if (!is.null(stream_writer)) {
stream_writer$close()
  }
})
{code}


Likewise, I cannot find a way to iterate the stream batch by batch

{code}
RecordBatchStreamReader(connStream)$batches()  # Here, looks there's no way to 
use socket.
{code}

This looks easily possible in Python side but looks missing in R APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3779) [Python] Validate timezone passed to pa.timestamp

2019-02-07 Thread Pindikura Ravindra (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763371#comment-16763371
 ] 

Pindikura Ravindra commented on ARROW-3779:
---

[~shyamsingh]

> [Python] Validate timezone passed to pa.timestamp
> -
>
> Key: ARROW-3779
> URL: https://issues.apache.org/jira/browse/ARROW-3779
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4423) [C++] Update version of vendored gtest to 1.8.1

2019-02-07 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4423:
--
Labels: pull-request-available  (was: )

> [C++] Update version of vendored gtest to 1.8.1
> ---
>
> Key: ARROW-4423
> URL: https://issues.apache.org/jira/browse/ARROW-4423
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
>
> conda-forge builds already use 1.8.1
>  
> This is a little tricky because library files get renamed on windows with the 
> incremental version bump (debug files become libgmockd.lib).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-4423) [C++] Update version of vendored gtest to 1.8.1

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield reassigned ARROW-4423:
--

Assignee: Micah Kornfield

> [C++] Update version of vendored gtest to 1.8.1
> ---
>
> Key: ARROW-4423
> URL: https://issues.apache.org/jira/browse/ARROW-4423
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>
> conda-forge builds already use 1.8.1
>  
> This is a little tricky because library files get renamed on windows with the 
> incremental version bump (debug files become libgmockd.lib).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4264) [C++] Document why DCHECKs are used in kernels

2019-02-07 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4264:
--
Labels: pull-request-available  (was: )

> [C++] Document why DCHECKs are used in kernels
> --
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
>
> DCHECKs seem to be used where Status::Invalid might be considered more 
> appropriate (so programs don't crash).  See conversation on 
> [https://github.com/apache/arrow/pull/3287/files]
> based on conversation on this Jira and on the CL it seems DCHECKS are in fact 
> desired but we should document appropriate use for them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4509) [Format] Copy content from Metadata.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763352#comment-16763352
 ] 

Micah Kornfield commented on ARROW-4509:


Agreed.  I made the child tasks of the top level one to make the movement 
easier to review understand (I don't think we want to just concatenate all of 
the documents, but let me know if you disagree).

> [Format] Copy content from Metadata.rst to new document.
> 
>
> Key: ARROW-4509
> URL: https://issues.apache.org/jira/browse/ARROW-4509
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4511) remove individual documents in favor of new document once all content is moved

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4511:
---
Component/s: Format

> remove individual documents in favor of new document once all content is moved
> --
>
> Key: ARROW-4511
> URL: https://issues.apache.org/jira/browse/ARROW-4511
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> We might want to leave the documents in place and provide links to the new 
> consolidated document in case others are linking to published content.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4508) [Format] Copy content from Layout.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4508:
---
Component/s: Format
Summary: [Format] Copy content from Layout.rst to new document.  (was: 
Copy content from Layout.rst to new document.)

> [Format] Copy content from Layout.rst to new document.
> --
>
> Key: ARROW-4508
> URL: https://issues.apache.org/jira/browse/ARROW-4508
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4507) [Format] Create outline and introduction for new document.

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4507:
---
Component/s: Format
Summary: [Format] Create outline and introduction for new document.  
(was: Create outline and introduction for new document.)

> [Format] Create outline and introduction for new document.
> --
>
> Key: ARROW-4507
> URL: https://issues.apache.org/jira/browse/ARROW-4507
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> This will ensure the document has a good flow, other subtasks on the parent 
> will handle moving content from each of the documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4509) [Format] Copy content from Metadata.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4509:
---
Component/s: Format
Summary: [Format] Copy content from Metadata.rst to new document.  
(was: Copy content from Metadata.rst to new document.)

> [Format] Copy content from Metadata.rst to new document.
> 
>
> Key: ARROW-4509
> URL: https://issues.apache.org/jira/browse/ARROW-4509
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4510) [Format] copy content from IPC.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4510:
---
Component/s: Format

> [Format] copy content from IPC.rst to new document.
> ---
>
> Key: ARROW-4510
> URL: https://issues.apache.org/jira/browse/ARROW-4510
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4511) [Format] remove individual documents in favor of new document once all content is moved

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4511:
---
Summary: [Format] remove individual documents in favor of new document once 
all content is moved  (was: [FORMAT] remove individual documents in favor of 
new document once all content is moved)

> [Format] remove individual documents in favor of new document once all 
> content is moved
> ---
>
> Key: ARROW-4511
> URL: https://issues.apache.org/jira/browse/ARROW-4511
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> We might want to leave the documents in place and provide links to the new 
> consolidated document in case others are linking to published content.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4510) [Format] copy content from IPC.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4510:
---
Summary: [Format] copy content from IPC.rst to new document.  (was: copy 
content from IPC.rst to new document.)

> [Format] copy content from IPC.rst to new document.
> ---
>
> Key: ARROW-4510
> URL: https://issues.apache.org/jira/browse/ARROW-4510
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4511) [FORMAT] remove individual documents in favor of new document once all content is moved

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4511:
---
Summary: [FORMAT] remove individual documents in favor of new document once 
all content is moved  (was: remove individual documents in favor of new 
document once all content is moved)

> [FORMAT] remove individual documents in favor of new document once all 
> content is moved
> ---
>
> Key: ARROW-4511
> URL: https://issues.apache.org/jira/browse/ARROW-4511
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> We might want to leave the documents in place and provide links to the new 
> consolidated document in case others are linking to published content.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-4507) Create outline and introduction for new document.

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield reassigned ARROW-4507:
--

Assignee: Micah Kornfield

> Create outline and introduction for new document.
> -
>
> Key: ARROW-4507
> URL: https://issues.apache.org/jira/browse/ARROW-4507
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> This will ensure the document has a good flow, other subtasks on the parent 
> will handle moving content from each of the documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4508) Copy content from Layout.js to new document.

2019-02-07 Thread Micah Kornfield (JIRA)

Micah Kornfield created ARROW-4508:
--

 Summary: Copy content from Layout.js to new document.
 Key: ARROW-4508
 URL: https://issues.apache.org/jira/browse/ARROW-4508
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4511) remove individual documents in favor of new document once all content is moved

2019-02-07 Thread Micah Kornfield (JIRA)

Micah Kornfield created ARROW-4511:
--

 Summary: remove individual documents in favor of new document once 
all content is moved
 Key: ARROW-4511
 URL: https://issues.apache.org/jira/browse/ARROW-4511
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield


We might want to leave the documents in place and provide links to the new 
consolidated document in case others are linking to published content.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4510) copy content from IPC.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)

Micah Kornfield created ARROW-4510:
--

 Summary: copy content from IPC.rst to new document.
 Key: ARROW-4510
 URL: https://issues.apache.org/jira/browse/ARROW-4510
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4308) [Python] pyarrow has a hard dependency on pandas

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4308.
---
Resolution: Cannot Reproduce

> [Python] pyarrow has a hard dependency on pandas
> 
>
> Key: ARROW-4308
> URL: https://issues.apache.org/jira/browse/ARROW-4308
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> We either need to make pandas a soft dependency (as it was in the past) or 
> add it to the package requirements. Calling {{pip install pyarrow}} for 
> 0.12.0 in a fresh environment results in
> {code}
> In [1]: import pyarrow as pa  
>   
>
> ---
> ModuleNotFoundError   Traceback (most recent call last)
>  in 
> > 1 import pyarrow as pa
> ~/miniconda/envs/pyarrow-pip-3.7/lib/python3.7/site-packages/pyarrow/__init__.py
>  in 
>  52 
>  53 
> ---> 54 from pyarrow.lib import cpu_count, set_cpu_count
>  55 from pyarrow.lib import (null, bool_,
>  56  int8, int16, int32, int64,
> ~/miniconda/envs/pyarrow-pip-3.7/lib/python3.7/site-packages/pyarrow/table.pxi
>  in init pyarrow.lib()
>  26 pass
>  27 else:
> ---> 28 import pyarrow.pandas_compat as pdcompat
>  29 
>  30 
> ~/miniconda/envs/pyarrow-pip-3.7/lib/python3.7/site-packages/pyarrow/pandas_compat.py
>  in 
>  22 import re
>  23 
> ---> 24 import pandas.core.internals as _int
>  25 import numpy as np
>  26 import pandas as pd
> ModuleNotFoundError: No module named 'pandas.core'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4264) [C++] Document why DCHECKs are used in kernels

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4264:
---
Summary: [C++] Document why DCHECKs are used in kernels  (was: [C++] 
Convert DCHECKs in that check compute/* input parameters to error statuses)

> [C++] Document why DCHECKs are used in kernels
> --
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>
> DCHECKs seem to be used where Status::Invalid is more appropriate (so 
> programs don't crash).  See conversation on 
> https://github.com/apache/arrow/pull/3287/files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4509) Copy content from Metadata.rst to new document.

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763343#comment-16763343
 ] 

Wes McKinney commented on ARROW-4509:
-

I think we should put everything in one big document on the documentation site. 
One will be able to navigate it much more easily in Sphinx

> Copy content from Metadata.rst to new document.
> ---
>
> Key: ARROW-4509
> URL: https://issues.apache.org/jira/browse/ARROW-4509
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4509) Copy content from Metadata.rst to new document.

2019-02-07 Thread Micah Kornfield (JIRA)

Micah Kornfield created ARROW-4509:
--

 Summary: Copy content from Metadata.rst to new document.
 Key: ARROW-4509
 URL: https://issues.apache.org/jira/browse/ARROW-4509
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4264) [C++] Document why DCHECKs are used in kernels

2019-02-07 Thread Micah Kornfield (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-4264:
---
Description: 
DCHECKs seem to be used where Status::Invalid might be considered more 
appropriate (so programs don't crash).  See conversation on 
[https://github.com/apache/arrow/pull/3287/files]

based on conversation on this Jira and on the CL it seems DCHECKS are in fact 
desired but we should document appropriate use for them.

  was:DCHECKs seem to be used where Status::Invalid is more appropriate (so 
programs don't crash).  See conversation on 
https://github.com/apache/arrow/pull/3287/files


> [C++] Document why DCHECKs are used in kernels
> --
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>
> DCHECKs seem to be used where Status::Invalid might be considered more 
> appropriate (so programs don't crash).  See conversation on 
> [https://github.com/apache/arrow/pull/3287/files]
> based on conversation on this Jira and on the CL it seems DCHECKS are in fact 
> desired but we should document appropriate use for them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4404) [CI] AppVeyor toolchain build does not build anything

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763330#comment-16763330
 ] 

Wes McKinney commented on ARROW-4404:
-

So the problem is after you call {{call activate %ENV%}} that further {{conda 
install}} commands in a batch script cause the script to exit. So the 
workaround is to install everything you need in the environment with {{-n 
%ENV%}} and only then activate

> [CI] AppVeyor toolchain build does not build anything
> -
>
> Key: ARROW-4404
> URL: https://issues.apache.org/jira/browse/ARROW-4404
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Major
>
> I don't know when that started happening but the "Toolchain" AppVeyor build 
> only installs packages without building Arrow:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/21939887/job/x9vm1urgnl95evl0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4340) [C++] Update IWYU version in the `lint` dockerfile

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763325#comment-16763325
 ] 

Wes McKinney commented on ARROW-4340:
-

This probably needs to be updated now that we're on LLVM 7

> [C++] Update IWYU version in the `lint` dockerfile
> --
>
> Key: ARROW-4340
> URL: https://issues.apache.org/jira/browse/ARROW-4340
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> I was trying to cleanup the c++ imports based on the current docker-iwyu 
> suggestions, but it requires to be customized (symbol maps and pragmas) more 
> than it is currently. It'd also help a lot to use the latest IWYU version 
> (see the changelog https://include-what-you-use.org/)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4359) [Python][Parquet] Column metadata is not saved or loaded in parquet

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4359:

Summary: [Python][Parquet] Column metadata is not saved or loaded in 
parquet  (was: Column metadata is not saved or loaded in parquet)

> [Python][Parquet] Column metadata is not saved or loaded in parquet
> ---
>
> Key: ARROW-4359
> URL: https://issues.apache.org/jira/browse/ARROW-4359
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Seb Fru
>Priority: Major
>  Labels: parquet
>
> Hi all,
> a while ago I posted this issue:
> {color:#33}https://issues.apache.org/jira/browse/ARROW-3866{color}
> {color:#33}While working with Pyarrow I encountered another potential bug 
> related to column metadata: If I create a table containing columns with 
> metadata everything is fine. But after I save the table to parquet and load 
> it back as a table using pq.read_table, the column metadata is gone.{color}
>  
> {color:#33}As of now I can not say yet whether the metadata is not saved 
> correctly or not loaded correctly, as I have no idea how to verify it. 
> Unfortunately I also don't have the time try a lot, but I wanted to let you 
> know anyway. The mentioned issue can be used as example, just add the 
> following lines:{color}
>  
> >>> pq.write_table(tab, path)
> >>> tab2 = pq.read_table(path)
> >>> tab2.column(0).field.metadata
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4355) [C++] test-util functions are no longer part of libarrow

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763326#comment-16763326
 ] 

Wes McKinney commented on ARROW-4355:
-

Are you able to submit a patch for this?

> [C++] test-util functions are no longer part of libarrow
> 
>
> Key: ARROW-4355
> URL: https://issues.apache.org/jira/browse/ARROW-4355
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> I have used these functions in other artifacts like {{turbodbc}}. I would 
> like to have them back as part of libarrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4335) [C++] Better document sparse tensor support

2019-02-07 Thread Kenta Murata (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763336#comment-16763336
 ] 

Kenta Murata commented on ARROW-4335:
-

[~wesmckinn] I can do it.  Thank you.

> [C++] Better document sparse tensor support
> ---
>
> Key: ARROW-4335
> URL: https://issues.apache.org/jira/browse/ARROW-4335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently the documentation (including docstrings) for the sparse tensor 
> classes and methods is very... sparse. It would be nice to make those 
> approachable.
> (also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
> perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4335) [C++] Better document sparse tensor support

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1676#comment-1676
 ] 

Wes McKinney commented on ARROW-4335:
-

Mid-to-late March

> [C++] Better document sparse tensor support
> ---
>
> Key: ARROW-4335
> URL: https://issues.apache.org/jira/browse/ARROW-4335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently the documentation (including docstrings) for the sparse tensor 
> classes and methods is very... sparse. It would be nice to make those 
> approachable.
> (also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
> perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4335) [C++] Better document sparse tensor support

2019-02-07 Thread Kenta Murata (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763332#comment-16763332
 ] 

Kenta Murata commented on ARROW-4335:
-

I want to work on this.

[~wesmckinn] could you please tell me the deadline for 0.13.

> [C++] Better document sparse tensor support
> ---
>
> Key: ARROW-4335
> URL: https://issues.apache.org/jira/browse/ARROW-4335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently the documentation (including docstrings) for the sparse tensor 
> classes and methods is very... sparse. It would be nice to make those 
> approachable.
> (also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
> perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4333) [C++] Sketch out design for kernels and "query" execution in compute layer

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4333:

Fix Version/s: 0.13.0

> [C++] Sketch out design for kernels and "query" execution in compute layer
> --
>
> Key: ARROW-4333
> URL: https://issues.apache.org/jira/browse/ARROW-4333
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Major
>  Labels: analytics
> Fix For: 0.13.0
>
>
> It would be good to formalize the design of kernels and the controlling query 
> execution layer (e.g. volcano batch model?) to understand the following:
> Contracts for kernels:
>  * Thread safety of kernels?
>  * When Kernels should allocate memory vs expect preallocated memory?  How to 
> communicate requirements for a kernels memory allocaiton?
>  * How to communicate the whether a kernels execution is parallelizable 
> across a ChunkedArray?  How to determine if the order to execution across a 
> ChunkedArray is important?
>  * How to communicate when it is safe to re-use the same buffers and input 
> and output to the same kernel?
> What does the threading model look like for the higher level of control?  
> Where should synchronization happen?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4420) [INTEGRATION] Pin spark's version to the recently released arrow 0.12 patch

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4420:

Fix Version/s: 0.13.0

> [INTEGRATION] Pin spark's version to the recently released arrow 0.12 patch
> ---
>
> Key: ARROW-4420
> URL: https://issues.apache.org/jira/browse/ARROW-4420
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in 
> https://github.com/apache/arrow/pull/3300#discussion_r252026108



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763331#comment-16763331
 ] 

Wes McKinney commented on ARROW-4413:
-

[~bradleygrantham] are you able to test this out and submit a PR?

> [Python] pyarrow.hdfs.connect() failing
> ---
>
> Key: ARROW-4413
> URL: https://issues.apache.org/jira/browse/ARROW-4413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
> Environment: Python 2.7
> Hadoop distribution: Amazon 2.7.3
> Hive 2.1.1 
> Spark 2.1.1
> Tez 0.8.4
> Linux 4.4.35-33.55.amzn1.x86_64
>Reporter: Bradley Grantham
>Priority: Major
> Fix For: 0.13.0
>
>
> Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}.
> This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used 
> the same environment when testing that it still worked on {{v0.11.1}})
>  
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: fs = pa.hdfs.connect()
> ---
> TypeError Traceback (most recent call last)
>  in ()
> > 1 fs = pa.hdfs.connect()
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, 
> port, user, kerb_ticket, driver, extra_conf)
> 205 fs = HadoopFileSystem(host=host, port=port, user=user,
> 206   kerb_ticket=kerb_ticket, driver=driver,
> --> 207   extra_conf=extra_conf)
> 208 return fs
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, 
> host, port, user, kerb_ticket, driver, extra_conf)
>  36 _maybe_set_hadoop_classpath()
>  37 
> ---> 38 self._connect(host, port, user, kerb_ticket, driver, 
> extra_conf)
>  39 
>  40 def __reduce__(self):
> /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in 
> pyarrow.lib.HadoopFileSystem._connect()
>  72 if host is not None:
>  73 conf.host = tobytes(host)
> ---> 74 self.host = host
>  75 
>  76 conf.port = port
> TypeError: Expected unicode, got str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4416) [CI] Build gandiva in cpp docker image

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4416:

Fix Version/s: 0.13.0

> [CI] Build gandiva in cpp docker image
> --
>
> Key: ARROW-4416
> URL: https://issues.apache.org/jira/browse/ARROW-4416
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently Gandiva is not built, for the sake of completeness enable it by 
> default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4487) [C++] Appveyor toolchain build does not actually build the project

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4487.
---
Resolution: Fixed

dup of ARROW-4404

> [C++] Appveyor toolchain build does not actually build the project
> --
>
> Key: ARROW-4487
> URL: https://issues.apache.org/jira/browse/ARROW-4487
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I haven't figured out what's going on yet, but it appears that the build is 
> bailing out when calling {{conda install}} after activating the environment
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22145094/job/824x1h5wdakswq1t



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4409) [C++] Enable arrow::ipc internal JSON reader to read from a file path

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4409:

Fix Version/s: 0.13.0

> [C++] Enable arrow::ipc internal JSON reader to read from a file path
> -
>
> Key: ARROW-4409
> URL: https://issues.apache.org/jira/browse/ARROW-4409
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
> Fix For: 0.13.0
>
>
> This may make tests easier to write. Currently an input buffer is required, 
> so reading from a file requires some boilerplate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4404) [CI] AppVeyor toolchain build does not build anything

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763329#comment-16763329
 ] 

Wes McKinney commented on ARROW-4404:
-

I've fixed this in https://github.com/apache/arrow/pull/3567

> [CI] AppVeyor toolchain build does not build anything
> -
>
> Key: ARROW-4404
> URL: https://issues.apache.org/jira/browse/ARROW-4404
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Major
>
> I don't know when that started happening but the "Toolchain" AppVeyor build 
> only installs packages without building Arrow:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/21939887/job/x9vm1urgnl95evl0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4356) [CI] Add integration (docker) test for turbodbc

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763327#comment-16763327
 ] 

Wes McKinney commented on ARROW-4356:
-

+1

> [CI] Add integration (docker) test for turbodbc
> ---
>
> Key: ARROW-4356
> URL: https://issues.apache.org/jira/browse/ARROW-4356
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> We regularly break our API so that {{turbodbc}} needs to make minor changes 
> to support the new Arrow version. We should setup a small integration test to 
> check before a release that {{turbodbc}} can easily upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-4404) [CI] AppVeyor toolchain build does not build anything

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4404:
---

Assignee: Wes McKinney

> [CI] AppVeyor toolchain build does not build anything
> -
>
> Key: ARROW-4404
> URL: https://issues.apache.org/jira/browse/ARROW-4404
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Major
>
> I don't know when that started happening but the "Toolchain" AppVeyor build 
> only installs packages without building Arrow:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/21939887/job/x9vm1urgnl95evl0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4355) [C++] test-util functions are no longer part of libarrow

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4355:

Fix Version/s: 0.13.0

> [C++] test-util functions are no longer part of libarrow
> 
>
> Key: ARROW-4355
> URL: https://issues.apache.org/jira/browse/ARROW-4355
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> I have used these functions in other artifacts like {{turbodbc}}. I would 
> like to have them back as part of libarrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4201) [Gandiva] integrate test utils with arrow

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4201:

Fix Version/s: 0.13.0

> [Gandiva] integrate test utils with arrow
> -
>
> Key: ARROW-4201
> URL: https://issues.apache.org/jira/browse/ARROW-4201
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
> Fix For: 0.13.0
>
>
> The following tasks to be addressed as part of this Jira :
>  # move (or consolidate) data generators in generate_data.h to arrow
>  # move convenience fns in gandiva/tests/test_util.h to arrow
>  # move (or consolidate) EXPECT_ARROW_* fns to arrow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4359) Column metadata is not saved or loaded in parquet

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4359:

Labels: parquet  (was: )

> Column metadata is not saved or loaded in parquet
> -
>
> Key: ARROW-4359
> URL: https://issues.apache.org/jira/browse/ARROW-4359
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Seb Fru
>Priority: Major
>  Labels: parquet
>
> Hi all,
> a while ago I posted this issue:
> {color:#33}https://issues.apache.org/jira/browse/ARROW-3866{color}
> {color:#33}While working with Pyarrow I encountered another potential bug 
> related to column metadata: If I create a table containing columns with 
> metadata everything is fine. But after I save the table to parquet and load 
> it back as a table using pq.read_table, the column metadata is gone.{color}
>  
> {color:#33}As of now I can not say yet whether the metadata is not saved 
> correctly or not loaded correctly, as I have no idea how to verify it. 
> Unfortunately I also don't have the time try a lot, but I wanted to let you 
> know anyway. The mentioned issue can be used as example, just add the 
> following lines:{color}
>  
> >>> pq.write_table(tab, path)
> >>> tab2 = pq.read_table(path)
> >>> tab2.column(0).field.metadata
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4286) [C++/R] Namespace vendored Boost

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4286:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++/R] Namespace vendored Boost
> 
>
> Key: ARROW-4286
> URL: https://issues.apache.org/jira/browse/ARROW-4286
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Packaging, R
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> For R, we vendor Boost and thus also include the symbols privately in our 
> modules. While they are private, some things like virtual destructors can 
> still interfere with other packages that vendor Boost. We should also 
> namespace the vendored Boost as we do in the manylinux1 packaging: 
> https://github.com/apache/arrow/blob/0f8bd747468dd28c909ef823bed77d8082a5b373/python/manylinux1/scripts/build_boost.sh#L28



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4302) [C++] Add OpenSSL to C++ build toolchain

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763317#comment-16763317
 ] 

Wes McKinney commented on ARROW-4302:
-

There is code in gRPC for OpenSSL you can look at

> [C++] Add OpenSSL to C++ build toolchain
> 
>
> Key: ARROW-4302
> URL: https://issues.apache.org/jira/browse/ARROW-4302
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Deepak Majeti
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> This is needed for encryption support for Parquet, among other things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4359) Column metadata is not saved or loaded in parquet

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4359:

Component/s: Python

> Column metadata is not saved or loaded in parquet
> -
>
> Key: ARROW-4359
> URL: https://issues.apache.org/jira/browse/ARROW-4359
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Seb Fru
>Priority: Major
>
> Hi all,
> a while ago I posted this issue:
> {color:#33}https://issues.apache.org/jira/browse/ARROW-3866{color}
> {color:#33}While working with Pyarrow I encountered another potential bug 
> related to column metadata: If I create a table containing columns with 
> metadata everything is fine. But after I save the table to parquet and load 
> it back as a table using pq.read_table, the column metadata is gone.{color}
>  
> {color:#33}As of now I can not say yet whether the metadata is not saved 
> correctly or not loaded correctly, as I have no idea how to verify it. 
> Unfortunately I also don't have the time try a lot, but I wanted to let you 
> know anyway. The mentioned issue can be used as example, just add the 
> following lines:{color}
>  
> >>> pq.write_table(tab, path)
> >>> tab2 = pq.read_table(path)
> >>> tab2.column(0).field.metadata
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4363) [C++] Add CMake format checks

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4363:

Fix Version/s: 0.13.0

> [C++] Add CMake format checks
> -
>
> Key: ARROW-4363
> URL: https://issues.apache.org/jira/browse/ARROW-4363
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration, Developer Tools
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> We should try to standardize the formatting of our CMake files somehow.
> The [cmake-format utility|https://github.com/cheshirekow/cmake_format] could 
> help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4369) [Packaging] Release verification script should test linux packages via docker

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4369:

Fix Version/s: 0.14.0

> [Packaging] Release verification script should test linux packages via docker
> -
>
> Key: ARROW-4369
> URL: https://issues.apache.org/jira/browse/ARROW-4369
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> It shouldn't be too hard to create a verification script which checks the 
> linux packages. This could prevent issues like [ARROW-4368] / 
> [https://github.com/apache/arrow/issues/3476]
> I suggest to separate the current verification script into one which verifies 
> the source release artifact and another which verifies the binaries:
>  * checksum and signatures as is right now
>  * install linux packages on multiple distros via docker
> We could test wheels and conda packages as well, but in follow-up PRs.
>  
> cc [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4340) [C++] Update IWYU version in the `lint` dockerfile

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4340:

Fix Version/s: 0.13.0

> [C++] Update IWYU version in the `lint` dockerfile
> --
>
> Key: ARROW-4340
> URL: https://issues.apache.org/jira/browse/ARROW-4340
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> I was trying to cleanup the c++ imports based on the current docker-iwyu 
> suggestions, but it requires to be customized (symbol maps and pragmas) more 
> than it is currently. It'd also help a lot to use the latest IWYU version 
> (see the changelog https://include-what-you-use.org/)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4506) [Ruby] Add Arrow::RecordBatch#raw_records

2019-02-07 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4506:
--
Labels: pull-request-available  (was: )

> [Ruby] Add Arrow::RecordBatch#raw_records
> -
>
> Key: ARROW-4506
> URL: https://issues.apache.org/jira/browse/ARROW-4506
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>
> I want to add Arrow::RecordBatch#raw_records method to convert a record batch 
> object to a nested array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4267) [Python/C++] Segfault when reading rowgroups with duplicated columns

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4267:

Fix Version/s: 0.14.0

> [Python/C++] Segfault when reading rowgroups with duplicated columns
> 
>
> Key: ARROW-4267
> URL: https://issues.apache.org/jira/browse/ARROW-4267
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.14.0
>
>
> When reading a row group using duplicated columns I receive a segfault.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({
> "col": ["A", "B"]
> })
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> parquet_file = pq.ParquetFile(buf.getvalue())
> parquet_file.read_row_group(0)
> parquet_file.read_row_group(0, columns=["col"])
> # boom
> parquet_file.read_row_group(0, columns=["col", "col"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4334) [CI] Setup conda-forge channel globally in travis builds

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4334:

Fix Version/s: 0.13.0

> [CI] Setup conda-forge channel globally in travis builds
> 
>
> Key: ARROW-4334
> URL: https://issues.apache.org/jira/browse/ARROW-4334
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> It looks like conda-forge is already set as top-priority channel: 
> [https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh#L71]
> We can most certeinly remove all occurrences of {{-c conda-forge}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4337) [C#] Array / RecordBatch Builder Fluent API

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4337:

Fix Version/s: 0.13.0

> [C#] Array / RecordBatch Builder Fluent API
> ---
>
> Key: ARROW-4337
> URL: https://issues.apache.org/jira/browse/ARROW-4337
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Chris Hutchinson
>Assignee: Chris Hutchinson
>Priority: Major
>  Labels: c#, pull-request-available
> Fix For: 0.13.0
>
>   Original Estimate: 12h
>  Time Spent: 10m
>  Remaining Estimate: 11h 50m
>
> Implement a fluent API for building arrays and record batches from Arrow 
> buffers, flat arrays, spans, enumerables, etc.
> A future implementation could extend this API with support for ADO.NET 
> DataTables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4334) [CI] Setup conda-forge channel globally in travis builds

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763322#comment-16763322
 ] 

Wes McKinney commented on ARROW-4334:
-

Is this done?

> [CI] Setup conda-forge channel globally in travis builds
> 
>
> Key: ARROW-4334
> URL: https://issues.apache.org/jira/browse/ARROW-4334
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> It looks like conda-forge is already set as top-priority channel: 
> [https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh#L71]
> We can most certeinly remove all occurrences of {{-c conda-forge}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4333) [C++] Sketch out design for kernels and "query" execution in compute layer

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763321#comment-16763321
 ] 

Wes McKinney commented on ARROW-4333:
-

Yes, I agree with coming up with a plan for all these concerns. I think the 
volcano batch model is the way to go. This is what Impala and many other 
systems use quite successfully

> [C++] Sketch out design for kernels and "query" execution in compute layer
> --
>
> Key: ARROW-4333
> URL: https://issues.apache.org/jira/browse/ARROW-4333
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Major
>  Labels: analytics
> Fix For: 0.13.0
>
>
> It would be good to formalize the design of kernels and the controlling query 
> execution layer (e.g. volcano batch model?) to understand the following:
> Contracts for kernels:
>  * Thread safety of kernels?
>  * When Kernels should allocate memory vs expect preallocated memory?  How to 
> communicate requirements for a kernels memory allocaiton?
>  * How to communicate the whether a kernels execution is parallelizable 
> across a ChunkedArray?  How to determine if the order to execution across a 
> ChunkedArray is important?
>  * How to communicate when it is safe to re-use the same buffers and input 
> and output to the same kernel?
> What does the threading model look like for the higher level of control?  
> Where should synchronization happen?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4335) [C++] Better document sparse tensor support

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763323#comment-16763323
 ] 

Wes McKinney commented on ARROW-4335:
-

[~mrkn] can you work on this for 0.13?

> [C++] Better document sparse tensor support
> ---
>
> Key: ARROW-4335
> URL: https://issues.apache.org/jira/browse/ARROW-4335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently the documentation (including docstrings) for the sparse tensor 
> classes and methods is very... sparse. It would be nice to make those 
> approachable.
> (also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
> perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4335) [C++] Better document sparse tensor support

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4335:

Fix Version/s: 0.13.0

> [C++] Better document sparse tensor support
> ---
>
> Key: ARROW-4335
> URL: https://issues.apache.org/jira/browse/ARROW-4335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently the documentation (including docstrings) for the sparse tensor 
> classes and methods is very... sparse. It would be nice to make those 
> approachable.
> (also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
> perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4331) [C++] Extend Scalar Datum to support more types

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4331:

Fix Version/s: 0.13.0

> [C++] Extend Scalar Datum to support more types 
> 
>
> Key: ARROW-4331
> URL: https://issues.apache.org/jira/browse/ARROW-4331
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Major
> Fix For: 0.13.0
>
>
> Per discusssion on ARROW-47 once [https://github.com/apache/arrow/pull/3407] 
> is merged we should support more types (including string, possibly struct, 
> etc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4331) [C++] Extend Scalar Datum to support more types

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763319#comment-16763319
 ] 

Wes McKinney commented on ARROW-4331:
-

-0/-1. I'd prefer to address this through ARROW-47 / scalar object model. I'm 
going to put up an RFC patch soon for ARROW-47

> [C++] Extend Scalar Datum to support more types 
> 
>
> Key: ARROW-4331
> URL: https://issues.apache.org/jira/browse/ARROW-4331
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Major
>
> Per discusssion on ARROW-47 once [https://github.com/apache/arrow/pull/3407] 
> is merged we should support more types (including string, possibly struct, 
> etc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4327) [Python] pyarrow fails to load libarrow.so in Fedora / CentOS Docker build

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763318#comment-16763318
 ] 

Wes McKinney commented on ARROW-4327:
-

This should use a requirements file instead of listing packages. Can you submit 
a PR?

> [Python] pyarrow fails to load libarrow.so in Fedora / CentOS Docker build
> --
>
> Key: ARROW-4327
> URL: https://issues.apache.org/jira/browse/ARROW-4327
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
> Environment: CentOS7 or Fedora29
>Reporter: Ryan White
>Priority: Minor
>  Labels: build
> Fix For: 0.13.0
>
>
> Trying to build pyarrow on CentOS or Fedora fails to load libarrow.so. The 
> build does not use conda, rather is similar to the OSX build instructions. 
>  A dockerfile is available here:
> https://github.com/ryanmackenziewhite/dockers/blob/master/centos7-py36-arrowbuild/Dockerfile
> {code:java}
> // ImportError while loading conftest 
> '/work/repos/arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/__init__.py:54: in 
> from pyarrow.lib import cpu_count, set_cpu_count
> E ImportError: libarrow.so.12: cannot open shared object file: No such file 
> or directory
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4327) [Python] pyarrow fails to load libarrow.so in Fedora / CentOS Docker build

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4327:

Fix Version/s: 0.13.0

> [Python] pyarrow fails to load libarrow.so in Fedora / CentOS Docker build
> --
>
> Key: ARROW-4327
> URL: https://issues.apache.org/jira/browse/ARROW-4327
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
> Environment: CentOS7 or Fedora29
>Reporter: Ryan White
>Priority: Minor
>  Labels: build
> Fix For: 0.13.0
>
>
> Trying to build pyarrow on CentOS or Fedora fails to load libarrow.so. The 
> build does not use conda, rather is similar to the OSX build instructions. 
>  A dockerfile is available here:
> https://github.com/ryanmackenziewhite/dockers/blob/master/centos7-py36-arrowbuild/Dockerfile
> {code:java}
> // ImportError while loading conftest 
> '/work/repos/arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/__init__.py:54: in 
> from pyarrow.lib import cpu_count, set_cpu_count
> E ImportError: libarrow.so.12: cannot open shared object file: No such file 
> or directory
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4143) [Python] Skip rows while reading parquet file

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4143.
---
Resolution: Duplicate

dup of ARROW-3705

> [Python] Skip rows while reading parquet file
> -
>
> Key: ARROW-4143
> URL: https://issues.apache.org/jira/browse/ARROW-4143
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Sanchit
>Priority: Minor
>  Labels: newbie
>
> Is there any functionality in pyarrow that allows reading the file partially. 
> Means if I wish to read only the first 10 rows from the parquet file. 
> I got this situation while doing this:
> `df = pd.read_parquet(path= 'filepath', nrows = 10)`  #Gave me error
> I wanted to read just the 10 rows into pandas dataframe using the 
> read_parquet, (read_parquet uses pyarrow as one of the engines to read 
> parquet file). As the parquet file is considerably huge in size, if one wants 
> to read only a few n rows is there any functionality we can add in the engine 
> to do so?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4506) [Ruby] Add Arrow::RecordBatch#raw_records

2019-02-07 Thread Kenta Murata (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-4506:

External issue URL: https://github.com/apache/arrow/pull/3587

> [Ruby] Add Arrow::RecordBatch#raw_records
> -
>
> Key: ARROW-4506
> URL: https://issues.apache.org/jira/browse/ARROW-4506
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I want to add Arrow::RecordBatch#raw_records method to convert a record batch 
> object to a nested array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4146) [C++] Extend visitor functions to include ArrayBuilder and allow callable visitors

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4146:

Fix Version/s: 0.14.0

> [C++] Extend visitor functions to include ArrayBuilder and allow callable 
> visitors
> --
>
> Key: ARROW-4146
> URL: https://issues.apache.org/jira/browse/ARROW-4146
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
> Fix For: 0.14.0
>
>
> In addition to accepting objects with Visit methods for the visited type, 
> {{Visit(Array|Type)}} and {{Visit(Array|Type)Inline}} should accept objects 
> with overloaded call operators.
> In addition for inline visitation if a visitor can only visit one of the 
> potential unboxings then this can be detected at compile time and the full 
> type_id switch can be avoided (if the unboxed object cannot be visited then 
> do nothing). For example:
> {code}
> VisitTypeInline(some_type, [](const StructType& s) {
>   // only execute this if some_type.id() == Type::STRUCT
> });
> {code}
> Finally, visit functions should be added for visiting ArrayBuilders



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4267) [Python/C++] Segfault when reading rowgroups with duplicated columns

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763316#comment-16763316
 ] 

Wes McKinney commented on ARROW-4267:
-

can you submit a PR?

> [Python/C++] Segfault when reading rowgroups with duplicated columns
> 
>
> Key: ARROW-4267
> URL: https://issues.apache.org/jira/browse/ARROW-4267
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.14.0
>
>
> When reading a row group using duplicated columns I receive a segfault.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({
> "col": ["A", "B"]
> })
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> parquet_file = pq.ParquetFile(buf.getvalue())
> parquet_file.read_row_group(0)
> parquet_file.read_row_group(0, columns=["col"])
> # boom
> parquet_file.read_row_group(0, columns=["col", "col"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4272) [Python] Illegal hardware instruction on pyarrow import

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4272.
---
Resolution: Won't Fix

> [Python] Illegal hardware instruction on pyarrow import
> ---
>
> Key: ARROW-4272
> URL: https://issues.apache.org/jira/browse/ARROW-4272
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1
> Environment: Python 3.6.7
> PySpark 2.4.0
> PyArrow: 0.11.1
> Pandas: 0.23.4
> NumPy: 1.15.4
> OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux
>Reporter: Elchin
>Priority: Critical
> Attachments: core
>
>
> I can't import pyarrow, it crashes:
> {code:java}
> >>> import pyarrow as pa
> [1]    31441 illegal hardware instruction (core dumped)  python3{code}
> Core dump is attached to issue, it can help you to understand what is the 
> problem.
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4264) [C++] Convert DCHECKs in that check compute/* input parameters to error statuses

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763315#comment-16763315
 ] 

Wes McKinney commented on ARROW-4264:
-

I don't think Status should be used for type checking

> [C++] Convert DCHECKs in that check compute/* input parameters to error 
> statuses
> 
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>
> DCHECKs seem to be used where Status::Invalid is more appropriate (so 
> programs don't crash).  See conversation on 
> https://github.com/apache/arrow/pull/3287/files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4259) [Plasma] CI failure in test_plasma_tf_op

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763314#comment-16763314
 ] 

Wes McKinney commented on ARROW-4259:
-

Are these tests disabled now?

> [Plasma] CI failure in test_plasma_tf_op
> 
>
> Key: ARROW-4259
> URL: https://issues.apache.org/jira/browse/ARROW-4259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Recently-appeared failure on master:
> https://travis-ci.org/apache/arrow/jobs/479378188#L7108



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4232) [C++] Follow conda-forge compiler ABI migration

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4232:

Fix Version/s: 0.13.0

> [C++] Follow conda-forge compiler ABI migration
> ---
>
> Key: ARROW-4232
> URL: https://issues.apache.org/jira/browse/ARROW-4232
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Continuous Integration, Documentation
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> conda-forge packages will soon (on Jan 15th) be completely migrated to the 
> gcc5 C++ ABI. We will need to update developer documentation, and perhaps fix 
> broken CI builds.
> Reference: https://twitter.com/condaforge/status/108337935784078



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4220) [Python] Add buffered input and output stream ASV benchmarks with simulated high latency IO

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4220:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Add buffered input and output stream ASV benchmarks with simulated 
> high latency IO
> ---
>
> Key: ARROW-4220
> URL: https://issues.apache.org/jira/browse/ARROW-4220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Follow up to ARROW-3126



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4208:

Fix Version/s: 0.13.0

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: s3
> Fix For: 0.13.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4201) [C++][Gandiva] integrate test utils with arrow

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4201:

Summary: [C++][Gandiva] integrate test utils with arrow  (was: [Gandiva] 
integrate test utils with arrow)

> [C++][Gandiva] integrate test utils with arrow
> --
>
> Key: ARROW-4201
> URL: https://issues.apache.org/jira/browse/ARROW-4201
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
> Fix For: 0.13.0
>
>
> The following tasks to be addressed as part of this Jira :
>  # move (or consolidate) data generators in generate_data.h to arrow
>  # move convenience fns in gandiva/tests/test_util.h to arrow
>  # move (or consolidate) EXPECT_ARROW_* fns to arrow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4192) "./dev/run_docker_compose.sh" is out of date

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763311#comment-16763311
 ] 

Wes McKinney commented on ARROW-4192:
-

[~kszucs] is this still an issue?

> "./dev/run_docker_compose.sh" is out of date
> 
>
> Key: ARROW-4192
> URL: https://issues.apache.org/jira/browse/ARROW-4192
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> The Parquet repo shouldn't be required anymore.
> {code:bash}
> $ ./dev/run_docker_compose.sh iwyu
> Please clone the Parquet repo next to the Arrow repo
> {code}
> Also, there's another error when trying to run {{docker-compose}} directly:
> {code:bash}
> $ docker-compose -f arrow/dev/docker-compose.yml build iwyu
> ERROR: build path /home/antoine/arrow/dev/dask_integration either does not 
> exist, is not accessible, or is not a valid URL.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4202) [Gandiva] use ArrayFromJson in tests

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4202:

Fix Version/s: 0.14.0

> [Gandiva] use ArrayFromJson in tests
> 
>
> Key: ARROW-4202
> URL: https://issues.apache.org/jira/browse/ARROW-4202
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
> Fix For: 0.14.0
>
>
> Most of the gandiva tests use wrappers over ArrowFromVector. These will 
> become a lot more readable if we switch to ArrayFromJSON.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4192) "./dev/run_docker_compose.sh" is out of date

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4192:

Fix Version/s: 0.13.0

> "./dev/run_docker_compose.sh" is out of date
> 
>
> Key: ARROW-4192
> URL: https://issues.apache.org/jira/browse/ARROW-4192
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> The Parquet repo shouldn't be required anymore.
> {code:bash}
> $ ./dev/run_docker_compose.sh iwyu
> Please clone the Parquet repo next to the Arrow repo
> {code}
> Also, there's another error when trying to run {{docker-compose}} directly:
> {code:bash}
> $ docker-compose -f arrow/dev/docker-compose.yml build iwyu
> ERROR: build path /home/antoine/arrow/dev/dask_integration either does not 
> exist, is not accessible, or is not a valid URL.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4159) [C++] Check for -Wdocumentation issues

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4159:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Check for -Wdocumentation issues 
> ---
>
> Key: ARROW-4159
> URL: https://issues.apache.org/jira/browse/ARROW-4159
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux 
> distribution but not another, both with clang-6.0. Not sure why that is 
> exactly, but it would be good to try to reproduce and see if our CI can be 
> improved to catch these, or in worst case we could do it in one of our 
> docker-compose builds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4176) [C++/Python] Human readable arrow schema comparison

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4176:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++/Python] Human readable arrow schema comparison
> ---
>
> Key: ARROW-4176
> URL: https://issues.apache.org/jira/browse/ARROW-4176
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.14.0
>
>
> When working with arrow schemas it would be helpful to have a human readable 
> representation of the diff between two schemas.
> This could be either exposed as a function returning a string/diff object or 
> via a function raising an Exception with this information.
> For instance:
> {code}
> schema_diff = get_schema_diff(schema1, schema2)
> expected_diff = """
> - col_changed: int8
> + col_changed: double
> + col_additional: int8
> """
> assert schema_diff == expected_diff
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4139:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is 
> set
> ---
>
> Key: ARROW-4139
> URL: https://issues.apache.org/jira/browse/ARROW-4139
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Matthew Rocklin
>Priority: Minor
>  Labels: parquet, python
> Fix For: 0.14.0
>
>
> When writing Pandas data to Parquet format and reading it back again I find 
> that that statistics of text columns are stored as byte arrays rather than as 
> unicode text. 
> I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding 
> of how best to manage statistics.  (I'd be quite happy to learn that it was 
> the latter).
> Here is a minimal example
> {code:python}
> import pandas as pd
> df = pd.DataFrame({'x': ['a']})
> df.to_parquet('df.parquet')
> import pyarrow.parquet as pq
> pf = pq.ParquetDataset('df.parquet')
> piece = pf.pieces[0]
> rg = piece.row_group(0)
> md = piece.get_metadata(pq.ParquetFile)
> rg = md.row_group(0)
> c = rg.column(0)
> >>> c
> 
>   file_offset: 63
>   file_path: 
>   physical_type: BYTE_ARRAY
>   num_values: 1
>   path_in_schema: x
>   is_stats_set: True
>   statistics:
> 
>   has_min_max: True
>   min: b'a'
>   max: b'a'
>   null_count: 0
>   distinct_count: 0
>   num_values: 1
>   physical_type: BYTE_ARRAY
>   compression: SNAPPY
>   encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
>   has_dictionary_page: True
>   dictionary_page_offset: 4
>   data_page_offset: 25
>   total_compressed_size: 59
>   total_uncompressed_size: 55
> >>> type(c.statistics.min)
> bytes
> {code}
> My guess is that we would want to store a logical type in the statistics like 
> UNICODE, though I don't have enough experience with Parquet data types to 
> know if this is a good idea or possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4506) [Ruby] Add Arrow::RecordBatch#raw_records

2019-02-07 Thread Kenta Murata (JIRA)

Kenta Murata created ARROW-4506:
---

 Summary: [Ruby] Add Arrow::RecordBatch#raw_records
 Key: ARROW-4506
 URL: https://issues.apache.org/jira/browse/ARROW-4506
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Ruby
Reporter: Kenta Murata
Assignee: Kenta Murata


I want to add Arrow::RecordBatch#raw_records method to convert a record batch 
object to a nested array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4143) [Python] Skip rows while reading parquet file

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4143:

Summary: [Python] Skip rows while reading parquet file  (was: Skip rows 
while reading parquet file)

> [Python] Skip rows while reading parquet file
> -
>
> Key: ARROW-4143
> URL: https://issues.apache.org/jira/browse/ARROW-4143
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Sanchit
>Priority: Minor
>  Labels: newbie
>
> Is there any functionality in pyarrow that allows reading the file partially. 
> Means if I wish to read only the first 10 rows from the parquet file. 
> I got this situation while doing this:
> `df = pd.read_parquet(path= 'filepath', nrows = 10)`  #Gave me error
> I wanted to read just the 10 rows into pandas dataframe using the 
> read_parquet, (read_parquet uses pyarrow as one of the engines to read 
> parquet file). As the parquet file is considerably huge in size, if one wants 
> to read only a few n rows is there any functionality we can add in the engine 
> to do so?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4120) [Python] Define process for testing procedures that check for no macro-level memory leaks

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4120:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Define process for testing procedures that check for no macro-level 
> memory leaks
> -
>
> Key: ARROW-4120
> URL: https://issues.apache.org/jira/browse/ARROW-4120
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Some kinds of memory leaks may be difficult to unit test for, and they may 
> not cause valgrind errors necessarily
> I had written some ad hoc leak tests in 
> https://github.com/apache/arrow/blob/master/python/scripts/test_leak.py. We 
> have some more of this in ARROW-3324. 
> It would be useful to be able to create a sort of "test suite" of memory leak 
> checks. They are a bit too intensive to run in CI (since you may have to run 
> something many iterations to see whether it leaks), but we could run them in 
> a nightly build



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4133) [C++/Python] ORC adapter should fail gracefully if /etc/timezone is missing instead of aborting

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4133:

Fix Version/s: 0.14.0

> [C++/Python] ORC adapter should fail gracefully if /etc/timezone is missing 
> instead of aborting
> ---
>
> Key: ARROW-4133
> URL: https://issues.apache.org/jira/browse/ARROW-4133
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: orc
> Fix For: 0.14.0
>
>
> The following core was genereted by nightly build: 
> https://travis-ci.org/kszucs/crossbow/builds/473397855
> {code}
> Core was generated by `/opt/conda/bin/python /opt/conda/bin/pytest -v 
> --pyargs pyarrow'.
> Program terminated with signal SIGABRT, Aborted.
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> [Current thread is 1 (Thread 0x7fea61f9e740 (LWP 179))]
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7fea608c8801 in __GI_abort () at abort.c:79
> #2  0x7fea4b3483df in __gnu_cxx::__verbose_terminate_handler ()
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
> #3  0x7fea4b346b16 in __cxxabiv1::__terminate (handler=)
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
> #4  0x7fea4b346b4c in std::terminate ()
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
> #5  0x7fea4b346d28 in __cxxabiv1::__cxa_throw (obj=0x2039220,
> tinfo=0x7fea494803d0 ,
> dest=0x7fea49087e52 )
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
> #6  0x7fea49086824 in orc::getTimezoneByFilename (filename=...)
> at /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:704
> #7  0x7fea490868d2 in orc::getLocalTimezone () at 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:713   
>   
> #8  0x7fea49063e59 in 
> orc::RowReaderImpl::RowReaderImpl (this=0x204fe30, _contents=..., opts=...)
> at /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Reader.cc:185
> #9  0x7fea4906651e in orc::ReaderImpl::createRowReader (this=0x1fb41b0, 
> opts=...)
> at /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Reader.cc:630
> #10 0x7fea48c2d904 in 
> arrow::adapters::orc::ORCFileReader::Impl::ReadSchema (this=0x1270600, 
> opts=..., 
>
> out=0x7ffe0ccae7b0) at /arrow/cpp/src/arrow/adapters/orc/adapter.cc:264
> #11 0x7fea48c2e18d in arrow::adapters::orc::ORCFileReader::Impl::Read 
> (this=0x1270600, out=0x7ffe0ccaea00)
> at /arrow/cpp/src/arrow/adapters/orc/adapter.cc:302
> #12 0x7fea48c2a8b9 in arrow::adapters::orc::ORCFileReader::Read 
> (this=0x1e14d10, out=0x7ffe0ccaea00)
> at /arrow/cpp/src/arrow/adapters/orc/adapter.cc:697   
>   
>   
> #13 0x7fea48218c9d in __pyx_pf_7pyarrow_4_orc_9ORCReader_12read 
> (__pyx_v_self=0x7fea43de8688,
> __pyx_v_include_indices=0x7fea61d07b70 <_Py_NoneStruct>) at _orc.cpp:3865
> #14 0x7fea48218b31 in __pyx_pw_7pyarrow_4_orc_9ORCReader_13read 
> (__pyx_v_self=0x7fea43de8688,
> __pyx_args=0x7fea61f5e048, __pyx_kwds=0x7fea444f78b8) at _orc.cpp:3813
> #15 0x7fea61910cbd in _PyCFunction_FastCallDict 
> (func_obj=func_obj@entry=0x7fea444b9558,
> args=args@entry=0x7fea44a40fa8, nargs=nargs@entry=0, 
> kwargs=kwargs@entry=0x7fea444f78b8)
> at Objects/methodobject.c:231
> #16 0x7fea61910f16 in _PyCFunction_FastCallKeywords 
> (func=func@entry=0x7fea444b9558,
> stack=stack@entry=0x7fea44a40fa8, nargs=0, 
> kwnames=kwnames@entry=0x7fea47d81d30) at Objects/methodobject.c:294
> #17 0x7fea619aa0da in call_function 
> (pp_stack=pp_stack@entry=0x7ffe0ccaecf0, oparg=,
> kwnames=kwnames@entry=0x7fea47d81d30) at Python/ceval.c:4837
> #18 0x7fea619abb46 in _PyEval_EvalFrameDefault (f=, 
> throwflag=)
> at Python/ceval.c:3351
> #19 0x7fea619a9cde in _PyEval_EvalCodeWithName

[jira] [Updated] (ARROW-4131) [Python] Coerce mixed columns to String

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4131:

Fix Version/s: 0.14.0

> [Python] Coerce mixed columns to String
> ---
>
> Key: ARROW-4131
> URL: https://issues.apache.org/jira/browse/ARROW-4131
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Leo Meyerovich
>Priority: Major
> Fix For: 0.14.0
>
>
> Continuing [https://github.com/apache/arrow/issues/3280] 
>  
> ===
>  
> I'm seeing variants of this elsewhere (e.g., 
> [wesm/feather#349|https://github.com/wesm/feather/issues/349] ) --
> Not all Pandas tables coerce to Arrow tables, and when they fail, not in a 
> way that is conducive to automation:
> Sample:
> {{mixed_df = pd.DataFrame(\{'mixed': [1, 'b']}) 
> pa.Table.from_pandas(mixed_df) => ArrowInvalid: ('Could not convert b with 
> type str: tried to convert to double', 'Conversion failed for column mixed 
> with type object') }}
> I would have expected behaviors more like the following:
>  * Coerce {{toString}} by default, with a default-off option to disallow 
> toString coercions
>  * Provide a default-off option to {{from_pandas}} to auto-coerce
>  * Name the exception so it is clear that this is a column coercion failure, 
> and include the column name(s), making this predictable and clearly 
> handleable by both library writers & users
> I lean towards:
>  * Defaults auto-coerce, improving life of early users, 
> `coerce_mixed_columns_to_strings=True`
>  * For less frequent yet more advanced library implementors, allow them to 
> override to `False`
>  * In their case, create a predictable & machine-readable exception, 
> `MixedColumnException(mixed_columns=['a', 'b', ...], msg="")`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4097) [C++] Add function to "conform" a dictionary array to a target new dictionary

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4097:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Add function to "conform" a dictionary array to a target new dictionary
> -
>
> Key: ARROW-4097
> URL: https://issues.apache.org/jira/browse/ARROW-4097
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Follow up work to ARROW-554. 
> Unifying multiple dictionary-encoded arrays is one use case. Another is 
> rewriting a DictionaryArray to be based on another dictionary. For example, 
> this would be used to implement Cast from one dictionary type to another.
> This will need to be able to insert nulls where there are values that are not 
> found in the target dictionary
> see also discussion at 
> https://github.com/apache/arrow/pull/3165#discussion_r243025730



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4108) [Python/Java] Spark integration tests do not work

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4108:

Fix Version/s: 0.13.0

> [Python/Java] Spark integration tests do not work
> -
>
> Key: ARROW-4108
> URL: https://issues.apache.org/jira/browse/ARROW-4108
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.12.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Because some commands in spark_integration.sh fail Spark integration test on 
> Docker container does not work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4099) [Python] Pretty printing very large ChunkedArray objects can use unbounded memory

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4099:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Pretty printing very large ChunkedArray objects can use unbounded 
> memory
> -
>
> Key: ARROW-4099
> URL: https://issues.apache.org/jira/browse/ARROW-4099
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> In working on ARROW-2970, I have the following dataset:
> {code}
> values = [b'x'] + [
> b'x' * (1 << 20)
> ] * 2 * (1 << 10)
> arr = np.array(values)
> arrow_arr = pa.array(arr)
> {code}
> The object {{arrow_arr}} has 129 chunks, each element of which is 1MB of 
> binary. The repr for this object is over 600MB:
> {code}
> In [10]: rep = repr(arrow_arr)
> In [11]: len(rep)
> Out[11]: 637536258
> {code}
> There's probably a number of failsafes we can implement to avoid badness in 
> these pathological cases (which may not happen often, but given the kinds of 
> bug reports we are seeing, people do have datasets that look like this)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4081) [Go] Sum methods on Mac OS X panic when the array is empty

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4081:

Summary: [Go] Sum methods on Mac OS X panic when the array is empty  (was: 
Sum methods on Mac OS X panic when the array is empty)

> [Go] Sum methods on Mac OS X panic when the array is empty
> --
>
> Key: ARROW-4081
> URL: https://issues.apache.org/jira/browse/ARROW-4081
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Jonathan A Sternberg
>Priority: Major
>
> If you create an empty array and use the `Sum` methods in the math package 
> for the Go version, they will panic with an `index out of range` error.
> The reproducers can be found in this file: 
> [https://github.com/influxdata/flux/blob/6ddfc0f235f91fa29562e7163ab9858f3d08cc62/arrow/arrow_test.go]
>  
> If you remove the `Skip`, the tests will cause a panic. I added them to our 
> repository so we could track when this was fixed and remove our own length 
> checks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4036) [C++] Make status codes pluggable

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4036:

Fix Version/s: 0.14.0

> [C++] Make status codes pluggable
> -
>
> Key: ARROW-4036
> URL: https://issues.apache.org/jira/browse/ARROW-4036
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we're defining all Status codes in the Arrow base library, even 
> those pertaining to sub-libraries such as arrow_python, plasma, gandiva, etc. 
> We should try to devise some kind of simple registry system to avoid 
> centralizing those.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4067:

Fix Version/s: 0.14.0

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
> Fix For: 0.14.0
>
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4092) [Rust] Implement common Reader / DataSource trait for CSV and Parquet

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4092:

Summary: [Rust] Implement common Reader / DataSource trait for CSV and 
Parquet  (was: Implement common Reader / DataSource trait for CSV and Parquet)

> [Rust] Implement common Reader / DataSource trait for CSV and Parquet
> -
>
> Key: ARROW-4092
> URL: https://issues.apache.org/jira/browse/ARROW-4092
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As a developer, I would like to be able to execute queries against Arrow data 
> sources using a common trait.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4076) [Python] schema validation and filters

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4076:

Fix Version/s: 0.14.0

> [Python] schema validation and filters
> --
>
> Key: ARROW-4076
> URL: https://issues.apache.org/jira/browse/ARROW-4076
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: George Sakkis
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently [schema 
> validation|https://github.com/apache/arrow/blob/758bd557584107cb336cbc3422744dacd93978af/python/pyarrow/parquet.py#L900]
>  of {{ParquetDataset}} takes place before filtering. This may raise a 
> {{ValueError}} if the schema is different in some dataset pieces, even if 
> these pieces would be subsequently filtered out. I think validation should 
> happen after filtering to prevent such spurious errors:
> {noformat}
> --- a/pyarrow/parquet.py  
> +++ b/pyarrow/parquet.py  
> @@ -878,13 +878,13 @@
>  if split_row_groups:
>  raise NotImplementedError("split_row_groups not yet implemented")
>  
> -if validate_schema:
> -self.validate_schemas()
> -
>  if filters is not None:
>  filters = _check_filters(filters)
>  self._filter(filters)
>  
> +if validate_schema:
> +self.validate_schemas()
> +
>  def validate_schemas(self):
>  open_file = self._get_open_file_func()
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4068) [Gandiva] Support building with Xcode 6.4

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4068.
---
Resolution: Won't Fix

We've moved past Xcode 6.4

> [Gandiva] Support building with Xcode 6.4
> -
>
> Key: ARROW-4068
> URL: https://issues.apache.org/jira/browse/ARROW-4068
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> In order to package Gandiva with Python wheels and conda packages on macOS, 
> it would be useful to build and run on Xcode 6.4 if it is not too difficult. 
> I am not sure what are the plans for upgrading past Xcode 6.4 in conda-forge



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4090) [Python] Table.flatten() doesn't work recursively

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4090:

Fix Version/s: 0.14.0

> [Python] Table.flatten() doesn't work recursively
> -
>
> Key: ARROW-4090
> URL: https://issues.apache.org/jira/browse/ARROW-4090
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Francisco Sanchez
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that the pyarrow.Table.flatten() function is not working recursively 
> nor providing a parameter to do it.
> {code}
> test1c_data = {'level1-A': 'abc',
>'level1-B': 112233,
>'level1-C': {'x': 123.111, 'y': 123.222, 'z': 123.333}
>   }
> test1c_type = pa.struct([('level1-A', pa.string()),
>  ('level1-B', pa.int32()),
>  ('level1-C', pa.struct([('x', pa.float64()),
>  ('y', pa.float64()),
>  ('z', pa.float64())
> ]))
> ])
> test1c_array = pa.array([test1c_data]*5, type=test1c_type)
> test1c_table = pa.Table.from_arrays([test1c_array], names=['msg']) 
> print('{}\n\n{}\n\n{}'.format(test1c_table.schema,
>   test1c_table.flatten().schema,
>   test1c_table.flatten().flatten().schema))
> {code}
> output:
> {quote}msg: struct double, y: double, z: double>>
>  child 0, level1-A: string
>  child 1, level1-B: int32
>  child 2, level1-C: struct
>  child 0, x: double
>  child 1, y: double
>  child 2, z: double
> msg.level1-A: string
>  msg.level1-B: int32
>  msg.level1-C: struct
>  child 0, x: double
>  child 1, y: double
>  child 2, z: double
> msg.level1-A: string
>  msg.level1-B: int32
>  msg.level1-C.x: double
>  msg.level1-C.y: double
>  msg.level1-C.z: double
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4091) [C++] Curate default list of CSV null spellings

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4091:

Fix Version/s: 0.14.0

> [C++] Curate default list of CSV null spellings
> ---
>
> Key: ARROW-4091
> URL: https://issues.apache.org/jira/browse/ARROW-4091
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> "NaN" is not null in Arrow-land (at least not for float columns?).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4056) [C++] boost-cpp toolchain packages causing crashes on Xcode > 6.4

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4056.
---
Resolution: Won't Fix

> [C++] boost-cpp toolchain packages causing crashes on Xcode > 6.4
> -
>
> Key: ARROW-4056
> URL: https://issues.apache.org/jira/browse/ARROW-4056
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> EDIT: the issue has been present for a large portion of 2018. I found this 
> when merging the macOS C++ builds and changed the build type to Xcode 8.3:
> https://travis-ci.org/wesm/arrow/jobs/469297420#L2856
> I reported the issue into conda-forge at 
> https://github.com/conda-forge/boost-cpp-feedstock/issues/40
> It seems that the Ray project worked around this earlier this year: 
> https://github.com/ray-project/ray/pull/1688



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4050) core dump on reading parquet file

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4050:

Fix Version/s: 0.13.0

> core dump on reading parquet file
> -
>
> Key: ARROW-4050
> URL: https://issues.apache.org/jira/browse/ARROW-4050
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Antonio Cavallo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: bug.parquet, working_python37_build_on_osx.sh
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi,
> I've a crash when doing this:
> {{import pyarrow.parquet as pq}}
> {{pq.read_table('bug.parquet')}}
> [^bug.parquet]
> (this is the same generated by 
> arrow/python/pyarrow/tests/test_parquet.py(112)test_single_pylist_column_roundtrip())



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4000) [Python] Error running CSV test_read_options on Windows

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4000.
---
Resolution: Cannot Reproduce

Haven't seen this issue on MSVC myself. Closing until we have a reproducible 
recipe

> [Python] Error running CSV test_read_options on Windows
> ---
>
> Key: ARROW-4000
> URL: https://issues.apache.org/jira/browse/ARROW-4000
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.11.1
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: csv, windows
>
> `py.test pyarrow -v` crashed at 
> `pyarrow/tests/test_csv.py::test_read_options`.
> errorlevel was -1073741819, not sure what that means.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4050) [Python][Parquet] core dump on reading parquet file

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4050:

Summary: [Python][Parquet] core dump on reading parquet file  (was: core 
dump on reading parquet file)

> [Python][Parquet] core dump on reading parquet file
> ---
>
> Key: ARROW-4050
> URL: https://issues.apache.org/jira/browse/ARROW-4050
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Antonio Cavallo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: bug.parquet, working_python37_build_on_osx.sh
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi,
> I've a crash when doing this:
> {{import pyarrow.parquet as pq}}
> {{pq.read_table('bug.parquet')}}
> [^bug.parquet]
> (this is the same generated by 
> arrow/python/pyarrow/tests/test_parquet.py(112)test_single_pylist_column_roundtrip())



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4046) [Python/CI] Run nightly large memory tests

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4046:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python/CI] Run nightly large memory tests
> --
>
> Key: ARROW-4046
> URL: https://issues.apache.org/jira/browse/ARROW-4046
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> See comment https://github.com/apache/arrow/pull/3171#issuecomment-447156646



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3

2019-02-07 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763299#comment-16763299
 ] 

Wes McKinney commented on ARROW-4024:
-

Let's bump the minimum Cython version

> [Python] Cython compilation error on cython==0.27.3
> ---
>
> Key: ARROW-4024
> URL: https://issues.apache.org/jira/browse/ARROW-4024
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
> Fix For: 0.13.0
>
>
> On the latest master, I'm getting the following error:
> {code:java}
> [ 11%] Compiling Cython CXX source for lib...
> Error compiling Cython file:
> 
> ...
>     out.init(type)
>     return out
> cdef object pyarrow_wrap_metadata(
>     ^
> 
> pyarrow/public-api.pxi:95:5: Function signature does not match previous 
> declaration
> CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' 
> failed{code}
> With 0.29.0 it is working. This might have been introduced in 
> [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b]
>  but I'm not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4047) [Python] Document use of int96 timestamps and options in Parquet docs

2019-02-07 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4047:

Fix Version/s: 0.13.0

> [Python] Document use of int96 timestamps and options in Parquet docs
> -
>
> Key: ARROW-4047
> URL: https://issues.apache.org/jira/browse/ARROW-4047
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: documentation, parquet
> Fix For: 0.13.0
>
>
> This is not mentioned in the prose docs; it would be helpful for people who 
> are using systems requiring int96 timestamps (e.g. Impala/Redshift Spectrum) 
> to have this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 >

1 - 100 of 221 matches

Mail list logo