[jira] [Updated] (ARROW-1621) [JAVA] Reduce Heap Usage per Vector

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1621:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [JAVA] Reduce Heap Usage per Vector
> ---
>
> Key: ARROW-1621
> URL: https://issues.apache.org/jira/browse/ARROW-1621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.11.0
>
>
> https://docs.google.com/document/d/1MU-ah_bBHIxXNrd7SkwewGCOOexkXJ7cgKaCis5f-PI/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1171) C++: Segmentation faults on Fedora 24 with pyarrow-manylinux1 and self-compiled turbodbc

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1171:

Fix Version/s: (was: 0.10.0)
   0.11.0

> C++: Segmentation faults on Fedora 24 with pyarrow-manylinux1 and 
> self-compiled turbodbc
> 
>
> Key: ARROW-1171
> URL: https://issues.apache.org/jira/browse/ARROW-1171
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.4.1
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Original issue: https://github.com/blue-yonder/turbodbc/issues/102
> When using the {{pyarrow}} {{manylinux1}} Wheels to build Turbodbc on Fedora 
> 24, the {{turbodbc_arrow}} unittests segfault. The main environment attribute 
> here is that the compiler version used for building Turbodbc is newer than 
> the one used for Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1860) [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1860:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] Add data structure to "stage" a sequence of IPC messages from in-memory 
> data
> --
>
> Key: ARROW-1860
> URL: https://issues.apache.org/jira/browse/ARROW-1860
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: text.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when you need to pre-allocate space for a record batch or a stream 
> (schema + dictionaries + record batches), you must make multiple passes over 
> the data structures of interest (and use e.g. {{MockOutputStream}} to compute 
> the size of the output buffer). It would be useful to make a single pass to 
> "prepare" the IPC payload for both sizing and writing to prevent having to 
> make multiple passes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2027) [C++] ipc::Message::SerializeTo does not pad the message body

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2027:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] ipc::Message::SerializeTo does not pad the message body
> -
>
> Key: ARROW-2027
> URL: https://issues.apache.org/jira/browse/ARROW-2027
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Panchen Xue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I just want to note this here as a follow-up to ARROW-1860. I think that 
> padding is the correct behavior, but I wasn't sure enough to make the fix 
> there



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2583) [Rust] Buffer should be typeless

2018-06-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527898#comment-16527898
 ] 

Chao Sun commented on ARROW-2583:
-

I just started to work on this recently and the changes are more involved than 
I expected. I think it will be difficult to make to 0.10.0 release.

> [Rust] Buffer should be typeless
> 
>
> Key: ARROW-2583
> URL: https://issues.apache.org/jira/browse/ARROW-2583
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.11.0
>
>
> See comments in [https://github.com/apache/arrow/pull/1971] for background on 
> this but the summary is that Buffer should just deal with untyped memory e.g. 
> `* const u8` and all type-handling should be moved to the Array layer e.g. 
> `BufferArray`.
> This would be more consistent with the other implementations.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1491:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-835) [Format] Add Timedelta type to describe time intervals

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-835:
---
Labels: columnar-format-1.0  (was: )

> [Format] Add Timedelta type to describe time intervals
> --
>
> Key: ARROW-835
> URL: https://issues.apache.org/jira/browse/ARROW-835
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Jeff Reback
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
>
> xref https://github.com/apache/arrow/pull/551 and 
> https://github.com/apache/arrow/pull/551#issuecomment-294325969
> this will allow round-tripping of pandas ``Timedelta`` and numpy 
> ``timedelt64[ns]`` types. The will have a similar TimeUnit to TimestampType 
> (s, us, ms, ns). Possible impl include making this pure 64-bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-835) [Format] Add Timedelta type to describe time intervals

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-835:
---
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Format] Add Timedelta type to describe time intervals
> --
>
> Key: ARROW-835
> URL: https://issues.apache.org/jira/browse/ARROW-835
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Jeff Reback
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
>
> xref https://github.com/apache/arrow/pull/551 and 
> https://github.com/apache/arrow/pull/551#issuecomment-294325969
> this will allow round-tripping of pandas ``Timedelta`` and numpy 
> ``timedelt64[ns]`` types. The will have a similar TimeUnit to TimestampType 
> (s, us, ms, ns). Possible impl include making this pure 64-bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2335) [Go] Move Go README one directory higher

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2335:

Summary: [Go] Move Go README one directory higher  (was: [Go] Remove extra 
directory nesting from go/ directory)

> [Go] Move Go README one directory higher
> 
>
> Key: ARROW-2335
> URL: https://issues.apache.org/jira/browse/ARROW-2335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Wes McKinney
>Assignee: Stuart Carnie
>Priority: Major
> Fix For: 0.10.0
>
>
> I noticed this after merging. I am not sure we need the {{arrow/go/arrow}} 
> directory structure if simply {{arrow/go}} would suffice



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2352) [C++/Python] Test OSX packaging in Travis matrix

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527949#comment-16527949
 ] 

Wes McKinney commented on ARROW-2352:
-

Inclined to resolve as Won't Fix with the packaging progress in the last couple 
months

> [C++/Python] Test OSX packaging in Travis matrix
> 
>
> Key: ARROW-2352
> URL: https://issues.apache.org/jira/browse/ARROW-2352
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.10.0
>
>
> At the moment, we only test the conda based build in Travis but we also ship 
> binary wheels after the release. The process of building them is currently 
> part of the {{arrow-dist}} repository and uses the {{multibuild}} scripts 
> that are used for many other Python packages that also have native code.
> The code should be ported to run as a real CI job, i.e. in addition to just 
> packaging the code, we will also need to run the unit tests. Furthermore, 
> once the job is running and green, we also need to look at the runtimes as we 
> already have a quite packed CI matrix and we expect that many steps of the 
> wheel build are just to setup the environment. We should be able to cache 
> them.
> Maybe we want to do this as a nightly cron. For a first draft, it will be ok 
> to add it to the full matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2399) [Rust] Builder should not provide a set() method

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2399:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Builder should not provide a set() method
> ---
>
> Key: ARROW-2399
> URL: https://issues.apache.org/jira/browse/ARROW-2399
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.11.0
>
>
> Arrays should be immutable, but we have a `set` method on Buffer that 
> should not be there.
> This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
> own memory instead and not use Buffer?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2602) [C++/Python] Automate build of development docker container

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2602:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++/Python] Automate build of development docker container
> ---
>
> Key: ARROW-2602
> URL: https://issues.apache.org/jira/browse/ARROW-2602
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> With 
> [https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089]
>  we provide a convenience docker container so that one can develop Arrow but 
> does not directly run into the hassles of setting up the development on chain 
> his machine.
> The current base image is not build automatically as we are waiting for input 
> from INFRA on https://issues.apache.org/jira/browse/INFRA-16533
> Once we know how to upload continously to docker hub, we should move the 
> Dockerfile appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2620) [Rust] Integrate memory pool abstraction with rest of codebase

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2620:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Integrate memory pool abstraction with rest of codebase
> --
>
> Key: ARROW-2620
> URL: https://issues.apache.org/jira/browse/ARROW-2620
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.11.0
>
>
> A memory pool abstraction was contributed but is not actually used by the 
> rest of the code base.
> We should either remove it or integrate it.
> If we integrate it, it should be done in a similar way to the C++ API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2681) [C++] Use source releases when building ORC instead of using GitHub tag snapshots

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527951#comment-16527951
 ] 

Wes McKinney commented on ARROW-2681:
-

I looked at this but can't figure out how to convince CMake ExternalProject to 
download from the ASF mirror system. Moving this to 0.11 in the meantime

> [C++] Use source releases when building ORC instead of using GitHub tag 
> snapshots
> -
>
> Key: ARROW-2681
> URL: https://issues.apache.org/jira/browse/ARROW-2681
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> See related discussion in ORC-374. It would be better to use the release 
> artifacts that have been voted on by the ORC PMC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2681) [C++] Use source releases when building ORC instead of using GitHub tag snapshots

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2681:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] Use source releases when building ORC instead of using GitHub tag 
> snapshots
> -
>
> Key: ARROW-2681
> URL: https://issues.apache.org/jira/browse/ARROW-2681
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> See related discussion in ORC-374. It would be better to use the release 
> artifacts that have been voted on by the ORC PMC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2765) [JS] add Vector.map

2018-06-29 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2765:


 Summary: [JS] add Vector.map
 Key: ARROW-2765
 URL: https://issues.apache.org/jira/browse/ARROW-2765
 Project: Apache Arrow
  Issue Type: New Feature
  Components: JavaScript
Reporter: Brian Hulette
 Fix For: JS-0.4.0


Add `Vector.map(f)` that returns a new vector transformed with `f`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2765) [JS] add Vector.map

2018-06-29 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528181#comment-16528181
 ] 

Brian Hulette commented on ARROW-2765:
--

To be consistent with JS typed arrays, I think {{Vector.map}} should return the 
same type as the source vector, and we can add {{Vector.from(source, f)}} to 
change types. What do you think of that [~paul.e.taylor]?

> [JS] add Vector.map
> ---
>
> Key: ARROW-2765
> URL: https://issues.apache.org/jira/browse/ARROW-2765
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> Add {{Vector.map(f)}} that returns a new vector transformed with {{f}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2763) [Python] Make parquet _metadata file accessible from ParquetDataset

2018-06-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2763:
--
Labels: pull-request-available  (was: )

> [Python] Make parquet _metadata file accessible from ParquetDataset
> ---
>
> Key: ARROW-2763
> URL: https://issues.apache.org/jira/browse/ARROW-2763
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Gruener
>Priority: Minor
>  Labels: pull-request-available
>
> Currently when creating a ParquetDataset it gives you access to the 
> _common_metadata file but not the _metadata file.
> We access the metadata file to get row group information of the dataset 
> without opening each footer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1875) Write 64-bit ints as strings in integration test JSON files

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528200#comment-16528200
 ] 

Wes McKinney commented on ARROW-1875:
-

[~paul.e.taylor] [~bhulette] how much work is this on the JS side? The C++ and 
Java side should be relatively straightforward

> Write 64-bit ints as strings in integration test JSON files
> ---
>
> Key: ARROW-1875
> URL: https://issues.apache.org/jira/browse/ARROW-1875
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Brian Hulette
>Priority: Minor
> Fix For: 0.11.0
>
>
> Javascript can't handle 64-bit integers natively, so writing them as strings 
> in the JSON would make implementing the integration tests a lot simpler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2764) [JS] Easy way to add a column to a Table

2018-06-29 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2764:


 Summary: [JS] Easy way to add a column to a Table
 Key: ARROW-2764
 URL: https://issues.apache.org/jira/browse/ARROW-2764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Brian Hulette
 Fix For: JS-0.4.0


It should be easier to add a new column to a table. API could be either 
`table.addColumn(vector)` or `table.merge(..tables or vectors)`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2352) [C++/Python] Test OSX packaging in Travis matrix

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2352:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++/Python] Test OSX packaging in Travis matrix
> 
>
> Key: ARROW-2352
> URL: https://issues.apache.org/jira/browse/ARROW-2352
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.11.0
>
>
> At the moment, we only test the conda based build in Travis but we also ship 
> binary wheels after the release. The process of building them is currently 
> part of the {{arrow-dist}} repository and uses the {{multibuild}} scripts 
> that are used for many other Python packages that also have native code.
> The code should be ported to run as a real CI job, i.e. in addition to just 
> packaging the code, we will also need to run the unit tests. Furthermore, 
> once the job is running and green, we also need to look at the runtimes as we 
> already have a quite packed CI matrix and we expect that many steps of the 
> wheel build are just to setup the environment. We should be able to cache 
> them.
> Maybe we want to do this as a nightly cron. For a first draft, it will be ok 
> to add it to the full matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2560) [Rust] The Rust README should include Rust-specific information on contributing

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2560:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] The Rust README should include Rust-specific information on 
> contributing
> ---
>
> Key: ARROW-2560
> URL: https://issues.apache.org/jira/browse/ARROW-2560
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Andy Grove
>Priority: Trivial
>  Labels: beginner
> Fix For: 0.11.0
>
>
> Every new contributor has their first build fail because they didn't know to 
> use cargo fmt.
> We should explain this in the Rust README along with any other pertinent 
> information specific to Rust contributions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2523) [Rust] Implement CAST operations for arrays

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2523:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Implement CAST operations for arrays
> ---
>
> Key: ARROW-2523
> URL: https://issues.apache.org/jira/browse/ARROW-2523
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.11.0
>
>
> I have implemented CAST operations in DataFusion but I would like to 
> re-implement this now directly in Arrow. I will create a PR after the Rust 
> refactor is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2335) [Go] Move Go README one directory higher

2018-06-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2335:
--
Labels: pull-request-available  (was: )

> [Go] Move Go README one directory higher
> 
>
> Key: ARROW-2335
> URL: https://issues.apache.org/jira/browse/ARROW-2335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Wes McKinney
>Assignee: Stuart Carnie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I noticed this after merging. I am not sure we need the {{arrow/go/arrow}} 
> directory structure if simply {{arrow/go}} would suffice



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2596) [GLib] Use the default value of GTK-Doc

2018-06-29 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-2596.
-
Resolution: Fixed

> [GLib] Use the default value of GTK-Doc
> ---
>
> Key: ARROW-2596
> URL: https://issues.apache.org/jira/browse/ARROW-2596
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2596) [GLib] Use the default value of GTK-Doc

2018-06-29 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528330#comment-16528330
 ] 

Kouhei Sutou commented on ARROW-2596:
-

Done by https://github.com/apache/arrow/pull/2058 .
Sorry I forgot to add "ARROW-2596: " prefix for the pull request.

> [GLib] Use the default value of GTK-Doc
> ---
>
> Key: ARROW-2596
> URL: https://issues.apache.org/jira/browse/ARROW-2596
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2768) [Packaging] Support Ubuntu 18.04

2018-06-29 Thread Yasuo Honda (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuo Honda updated ARROW-2768:
---
External issue URL: https://github.com/apache/arrow-dist/pull/29

> [Packaging] Support Ubuntu 18.04
> 
>
> Key: ARROW-2768
> URL: https://issues.apache.org/jira/browse/ARROW-2768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Yasuo Honda
>Priority: Major
>  Labels: pull-request-available
>
> I'd like to propose support Ubuntu 18.04 which is the latest LTS release of 
> Ubuntu.
> I'm going to open a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2457) garrow_array_builder_append_values() won't work for large arrays

2018-06-29 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-2457:

Component/s: (was: C)

> garrow_array_builder_append_values() won't work for large arrays
> 
>
> Key: ARROW-2457
> URL: https://issues.apache.org/jira/browse/ARROW-2457
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, GLib
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Haralampos Gavriilidis
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am using garrow_array_builder_append_values() to transform a native C array 
> to an Arrow array, without calling arrow_array_builder_append multiple times. 
> When calling garrow_array_builder_append_values() in array-builder.cpp with 
> following signature:
> {code:java}
> garrow_array_builder_append_values(GArrowArrayBuilder *builder,
> const VALUE *values,
> gint64 values_length,
> const gboolean *is_valids,
> gint64 is_valids_length,
> GError **error,
> const gchar *context)
> {code}
> it will fail for large arrays. This is probably happening because the 
> is_valids array is copied to the valid_bytes array (of different type), for 
> which the memory is allocated on the stack, and not on the heap, like shown 
> on the snippet below:
> {code:java}
> uint8_t valid_bytes[is_valids_length];
> for (gint64 i = 0; i < is_valids_length; ++i){ 
>   valid_bytes[i] = is_valids[i]; 
> }
> {code}
>  A way to avoid this problem would be to allocate memory for the valid_bytes 
> array using malloc() or something similar. Is this behavior intended, maybe 
> because no large arrays should be handed over to that function, or it is 
> rather a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2767) [JS] Add generic to Table for column names

2018-06-29 Thread Dominik Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528408#comment-16528408
 ] 

Dominik Moritz commented on ARROW-2767:
---

One thing I realized later is that the `getColumn(name)` function can then 
guarantee to return a column rather than returning `Column | null`. 

> [JS] Add generic to Table for column names
> --
>
> Key: ARROW-2767
> URL: https://issues.apache.org/jira/browse/ARROW-2767
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>
> Requested by [~domoritz]
> Something like:
> {code:javascript}
> class Table {
> ...
> getColumn(name: ColName): Vector {
> }
> ...
> }
> {code}
> It would be even better if we could find a way to map the column names to the 
> actual vector data types, but one thing at a time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2764) [JS] Easy way to create a new Table with an additional column

2018-06-29 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2764:
-
Summary: [JS] Easy way to create a new Table with an additional column  
(was: [JS] Easy way to add a column to a Table)

> [JS] Easy way to create a new Table with an additional column
> -
>
> Key: ARROW-2764
> URL: https://issues.apache.org/jira/browse/ARROW-2764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> It should be easier to add a new column to a table. API could be either 
> `table.addColumn(vector)` or `table.merge(..tables or vectors)`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2713) [Packaging] Fix linux package builds

2018-06-29 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528375#comment-16528375
 ] 

Kouhei Sutou commented on ARROW-2713:
-

Yay!

I want us to consider how to unify arrow-dist/cpp-linux/ and 
arrow/dev/tasks/linux-packages/ as the next step.
Should we move arrow-dist/cpp-linux/ to arrow/dev/tasks/linux-packages/?

> [Packaging] Fix linux package builds
> 
>
> Key: ARROW-2713
> URL: https://issues.apache.org/jira/browse/ARROW-2713
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.10.0
>
>
> Build configuration: 
> https://github.com/kszucs/arrow/tree/0d9d89b7bff32823ab68e6ec1dc7ade52511f7ee/dev/tasks/linux-packages
> Failing build: 
> https://travis-ci.org/kszucs/crossbow/builds/391894564?utm_source=github_status_medium=notification
> Looks like it’s waiting for a user input? There might be some hardcoded 
> version too, because the expected is 0.9.1 instead of 0.9.0.
> ping [~kou] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2766) [JS] Add ability to construct a Table from a list of Arrays/TypedArrays

2018-06-29 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2766:
-
Description: 
Something like 
{code:javascript}
Table.from({'col1': [...], 'col2': [...], 'col3': [...]})
{code}


  was:Something like {{Table.from({'col1': [...], 'col2': [...], 'col3': 
[...]})}}


> [JS] Add ability to construct a Table from a list of Arrays/TypedArrays
> ---
>
> Key: ARROW-2766
> URL: https://issues.apache.org/jira/browse/ARROW-2766
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>
> Something like 
> {code:javascript}
> Table.from({'col1': [...], 'col2': [...], 'col3': [...]})
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2766) [JS] Add ability to construct a Table from a list of Arrays/TypedArrays

2018-06-29 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2766:


 Summary: [JS] Add ability to construct a Table from a list of 
Arrays/TypedArrays
 Key: ARROW-2766
 URL: https://issues.apache.org/jira/browse/ARROW-2766
 Project: Apache Arrow
  Issue Type: New Feature
  Components: JavaScript
Reporter: Brian Hulette


Something like {{Table.from({'col1': [...], 'col2': [...], 'col3': [...]})}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2767) [JS] Add generic to Table for column names

2018-06-29 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2767:
-
Component/s: JavaScript

> [JS] Add generic to Table for column names
> --
>
> Key: ARROW-2767
> URL: https://issues.apache.org/jira/browse/ARROW-2767
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>
> Requested by [~domoritz]
> Something like:
> {code:javascript}
> class Table {
> ...
> getColumn(name: ColName): Vector {
> }
> ...
> }
> {code}
> It would be even better if we could find a way to map the column names to the 
> actual vector data types, but one thing at a time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2383) [C++] Debian packages need to depend on libprotobuf

2018-06-29 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528345#comment-16528345
 ] 

Kouhei Sutou commented on ARROW-2383:
-

I think that we can fix this with {{ARROW_PROTOBUF_USE_SHARED}} but I didn't 
try yet because arrow-dist uses 0.9.0 not master.
I'll try to use {{ARROW_PROTOBUF_USE_SHARED}} locally for now.

> [C++] Debian packages need to depend on libprotobuf
> ---
>
> Key: ARROW-2383
> URL: https://issues.apache.org/jira/browse/ARROW-2383
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> It seems that we are currently building protobuf using the ExternalProject 
> facility in the debian packages and thus conflict with the system provided 
> protobuf libraries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2768) [Packaging] Support Ubuntu 18.04

2018-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528550#comment-16528550
 ] 

ASF GitHub Bot commented on ARROW-2768:
---

yahonda opened a new pull request #29: ARROW-2768: Support Ubuntu 18.04
URL: https://github.com/apache/arrow-dist/pull/29
 
 
   This pull request provides Ubuntu 18.04 support.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Packaging] Support Ubuntu 18.04
> 
>
> Key: ARROW-2768
> URL: https://issues.apache.org/jira/browse/ARROW-2768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Yasuo Honda
>Priority: Major
>  Labels: pull-request-available
>
> I'd like to propose support Ubuntu 18.04 which is the latest LTS release of 
> Ubuntu.
> I'm going to open a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2768) [Packaging] Support Ubuntu 18.04

2018-06-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2768:
--
Labels: pull-request-available  (was: )

> [Packaging] Support Ubuntu 18.04
> 
>
> Key: ARROW-2768
> URL: https://issues.apache.org/jira/browse/ARROW-2768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Yasuo Honda
>Priority: Major
>  Labels: pull-request-available
>
> I'd like to propose support Ubuntu 18.04 which is the latest LTS release of 
> Ubuntu.
> I'm going to open a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2768) [Packaging] Support Ubuntu 18.04

2018-06-29 Thread Yasuo Honda (JIRA)
Yasuo Honda created ARROW-2768:
--

 Summary: [Packaging] Support Ubuntu 18.04
 Key: ARROW-2768
 URL: https://issues.apache.org/jira/browse/ARROW-2768
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Yasuo Honda


I'd like to propose support Ubuntu 18.04 which is the latest LTS release of 
Ubuntu.
I'm going to open a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2765) [JS] add Vector.map

2018-06-29 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2765:
-
Description: Add {{Vector.map(f)}} that returns a new vector transformed 
with {{f}}  (was: Add `Vector.map(f)` that returns a new vector transformed 
with `f`)

> [JS] add Vector.map
> ---
>
> Key: ARROW-2765
> URL: https://issues.apache.org/jira/browse/ARROW-2765
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> Add {{Vector.map(f)}} that returns a new vector transformed with {{f}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2768) [Packaging] Support Ubuntu 18.04

2018-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528559#comment-16528559
 ] 

ASF GitHub Bot commented on ARROW-2768:
---

yahonda commented on issue #29: ARROW-2768: Support Ubuntu 18.04
URL: https://github.com/apache/arrow-dist/pull/29#issuecomment-401519060
 
 
   cc @kou who suggested to open a pull request and a JIRA ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Packaging] Support Ubuntu 18.04
> 
>
> Key: ARROW-2768
> URL: https://issues.apache.org/jira/browse/ARROW-2768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Yasuo Honda
>Priority: Major
>  Labels: pull-request-available
>
> I'd like to propose support Ubuntu 18.04 which is the latest LTS release of 
> Ubuntu.
> I'm going to open a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yahonda commented on issue #29: ARROW-2768: Support Ubuntu 18.04

2018-06-29 Thread GitBox
yahonda commented on issue #29: ARROW-2768: Support Ubuntu 18.04
URL: https://github.com/apache/arrow-dist/pull/29#issuecomment-401519060
 
 
   cc @kou who suggested to open a pull request and a JIRA ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (ARROW-2713) [Packaging] Fix linux package builds

2018-06-29 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527307#comment-16527307
 ] 

Kouhei Sutou commented on ARROW-2713:
-

Great!

It seems that the build is timed out.

Can you apply the following patch?

{noformat}
diff --git a/dev/tasks/linux-packages/travis.linux.yml 
b/dev/tasks/linux-packages/travis.linux.yml
index 8cc63772..3e2bcc79 100644
--- a/dev/tasks/linux-packages/travis.linux.yml
+++ b/dev/tasks/linux-packages/travis.linux.yml
@@ -30,9 +30,9 @@ env:
 matrix:
   include:
 - script:
-- (cd arrow/dev/tasks/linux-packages && travis_wait 40 && rake 
version:update && rake apt:build 
APT_TARGETS=debian-stretch,ubuntu-trusty,ubuntu-xenial PARALLEL=yes DEBUG=no)
+- (cd arrow/dev/tasks/linux-packages && rake version:update && 
travis_wait 40 rake apt:build APT_TARGETS=debian-stretch,ubuntu-trusty 
PARALLEL=yes DEBUG=no)
 - script:
-- (cd arrow/dev/tasks/linux-packages && travis_wait 40 && rake 
version:update && rake apt:build APT_TARGETS=ubuntu-artful PARALLEL=yes 
DEBUG=no)
+- (cd arrow/dev/tasks/linux-packages && rake version:update && 
travis_wait 40 rake apt:build APT_TARGETS=ubuntu-xenial,ubuntu-artful 
PARALLEL=yes DEBUG=no)
 - script:
 - (cd arrow/dev/tasks/linux-packages && rake version:update && rake 
yum:build PARALLEL=yes DEBUG=no)
 {noformat}

> [Packaging] Fix linux package builds
> 
>
> Key: ARROW-2713
> URL: https://issues.apache.org/jira/browse/ARROW-2713
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.10.0
>
>
> Build configuration: 
> https://github.com/kszucs/arrow/tree/0d9d89b7bff32823ab68e6ec1dc7ade52511f7ee/dev/tasks/linux-packages
> Failing build: 
> https://travis-ci.org/kszucs/crossbow/builds/391894564?utm_source=github_status_medium=notification
> Looks like it’s waiting for a user input? There might be some hardcoded 
> version too, because the expected is 0.9.1 instead of 0.9.0.
> ping [~kou] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2374) [Rust] Add support for array of List

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2374:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Add support for array of List
> ---
>
> Key: ARROW-2374
> URL: https://issues.apache.org/jira/browse/ARROW-2374
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add support for List in Array types. Look at Utf8 which wraps List to 
> see how this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2399) [Rust] Builder should not provide a set() method

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527785#comment-16527785
 ] 

Antoine Pitrou commented on ARROW-2399:
---

Chiming in again: while arrays are immutable, buffers need not be immutable. In 
C++, we have both mutable and immutable buffers (and both kinds are allowed to 
back arrays).

> [Rust] Builder should not provide a set() method
> ---
>
> Key: ARROW-2399
> URL: https://issues.apache.org/jira/browse/ARROW-2399
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Arrays should be immutable, but we have a `set` method on Buffer that 
> should not be there.
> This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
> own memory instead and not use Buffer?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2374) [Rust] Add support for array of List

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527786#comment-16527786
 ] 

Antoine Pitrou commented on ARROW-2374:
---

Was this actually fixed by ARROW-2521 or is there still work to do?

> [Rust] Add support for array of List
> ---
>
> Key: ARROW-2374
> URL: https://issues.apache.org/jira/browse/ARROW-2374
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add support for List in Array types. Look at Utf8 which wraps List to 
> see how this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2560) [Rust] The Rust README should include Rust-specific information on contributing

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2560:
--
Labels: beginner  (was: )

> [Rust] The Rust README should include Rust-specific information on 
> contributing
> ---
>
> Key: ARROW-2560
> URL: https://issues.apache.org/jira/browse/ARROW-2560
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Andy Grove
>Priority: Trivial
>  Labels: beginner
> Fix For: 0.10.0
>
>
> Every new contributor has their first build fail because they didn't know to 
> use cargo fmt.
> We should explain this in the Rust README along with any other pertinent 
> information specific to Rust contributions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2583) [Rust] Buffer should be typeless

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2583:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Buffer should be typeless
> 
>
> Key: ARROW-2583
> URL: https://issues.apache.org/jira/browse/ARROW-2583
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.11.0
>
>
> See comments in [https://github.com/apache/arrow/pull/1971] for background on 
> this but the summary is that Buffer should just deal with untyped memory e.g. 
> `* const u8` and all type-handling should be moved to the Array layer e.g. 
> `BufferArray`.
> This would be more consistent with the other implementations.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2583) [Rust] Buffer should be typeless

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527787#comment-16527787
 ] 

Antoine Pitrou commented on ARROW-2583:
---

Since this would probably break the API, it would be nice to have this sooner 
than later (i.e. in 0.10.0), but this shouldn't block the release either.

> [Rust] Buffer should be typeless
> 
>
> Key: ARROW-2583
> URL: https://issues.apache.org/jira/browse/ARROW-2583
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.11.0
>
>
> See comments in [https://github.com/apache/arrow/pull/1971] for background on 
> this but the summary is that Buffer should just deal with untyped memory e.g. 
> `* const u8` and all type-handling should be moved to the Array layer e.g. 
> `BufferArray`.
> This would be more consistent with the other implementations.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2344) [Go] Run Go unit tests in Travis CI

2018-06-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2344:
--
Labels: pull-request-available  (was: )

> [Go] Run Go unit tests in Travis CI
> ---
>
> Key: ARROW-2344
> URL: https://issues.apache.org/jira/browse/ARROW-2344
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Go
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1722) [C++] Add linting script to look for C++/CLI issues

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527842#comment-16527842
 ] 

Wes McKinney commented on ARROW-1722:
-

I'm going to see if I can hack out a solution for this

> [C++] Add linting script to look for C++/CLI issues
> ---
>
> Key: ARROW-1722
> URL: https://issues.apache.org/jira/browse/ARROW-1722
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> This includes:
> * Using {{nullptr}} in header files (we must instead use an appropriate macro 
> to use {{__nullptr}} when the host compiler is C++/CLI)
> * Including {{}} in a public header (e.g. header files without "impl" 
> or "internal" in their name)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1722) [C++] Add linting script to look for C++/CLI issues

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1722:
---

Assignee: Wes McKinney

> [C++] Add linting script to look for C++/CLI issues
> ---
>
> Key: ARROW-1722
> URL: https://issues.apache.org/jira/browse/ARROW-1722
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> This includes:
> * Using {{nullptr}} in header files (we must instead use an appropriate macro 
> to use {{__nullptr}} when the host compiler is C++/CLI)
> * Including {{}} in a public header (e.g. header files without "impl" 
> or "internal" in their name)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1722) [C++] Add linting script to look for C++/CLI issues

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1722:

Fix Version/s: (was: 0.11.0)
   0.10.0

> [C++] Add linting script to look for C++/CLI issues
> ---
>
> Key: ARROW-1722
> URL: https://issues.apache.org/jira/browse/ARROW-1722
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> This includes:
> * Using {{nullptr}} in header files (we must instead use an appropriate macro 
> to use {{__nullptr}} when the host compiler is C++/CLI)
> * Including {{}} in a public header (e.g. header files without "impl" 
> or "internal" in their name)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1661) [Python] Python 3.7 support

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527846#comment-16527846
 ] 

Wes McKinney commented on ARROW-1661:
-

It'll probably take conda-forge a long time (at least 1 month), so I don't 
think we should block on that

> [Python] Python 3.7 support
> ---
>
> Key: ARROW-1661
> URL: https://issues.apache.org/jira/browse/ARROW-1661
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> See discussion in https://github.com/apache/arrow/issues/1125



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1987) [Website] Enable Docker-based documentation generator to build at a specific Arrow commit

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527848#comment-16527848
 ] 

Wes McKinney commented on ARROW-1987:
-

Yes, gen_apidocs. At minimum we should be testing nightly that the API doc 
build works. As far as the actual build and upload, a committer will need to do 
that

> [Website] Enable Docker-based documentation generator to build at a specific 
> Arrow commit
> -
>
> Key: ARROW-1987
> URL: https://issues.apache.org/jira/browse/ARROW-1987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> Currently both the Docker setup and the Arrow repo have to be at the same 
> commit. It would be useful to create a checkout in the Docker image and 
> enable the build version to be passed in



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1796) [Python] RowGroup filtering on file level

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1796:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] RowGroup filtering on file level
> -
>
> Key: ARROW-1796
> URL: https://issues.apache.org/jira/browse/ARROW-1796
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> We can build upon the API defined in {{fastparquet}} for defining RowGroup 
> filters: 
> https://github.com/dask/fastparquet/blob/master/fastparquet/api.py#L296-L300 
> and translate them into the C++ enums we will define in 
> https://issues.apache.org/jira/browse/PARQUET-1158 . This should enable us to 
> provide the user with a simple predicate pushdown API that we can extend in 
> the background from RowGroup to Page level later on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1796) [Python] RowGroup filtering on file level

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527852#comment-16527852
 ] 

Wes McKinney commented on ARROW-1796:
-

If Gandiva becomes a part of Apache Arrow, then we should look at compiling 
filters and pushing them down into parquet-cpp

> [Python] RowGroup filtering on file level
> -
>
> Key: ARROW-1796
> URL: https://issues.apache.org/jira/browse/ARROW-1796
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> We can build upon the API defined in {{fastparquet}} for defining RowGroup 
> filters: 
> https://github.com/dask/fastparquet/blob/master/fastparquet/api.py#L296-L300 
> and translate them into the C++ enums we will define in 
> https://issues.apache.org/jira/browse/PARQUET-1158 . This should enable us to 
> provide the user with a simple predicate pushdown API that we can extend in 
> the background from RowGroup to Page level later on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2061) [C++] Run ASAN builds in Travis CI

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527853#comment-16527853
 ] 

Wes McKinney commented on ARROW-2061:
-

We are using clang in our builds now. Any objections to switching to ASAN from 
valgrind?

> [C++] Run ASAN builds in Travis CI
> --
>
> Key: ARROW-2061
> URL: https://issues.apache.org/jira/browse/ARROW-2061
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> ASAN might be a better alternative to valgrind in builds where we have clang 
> available. As part of this, we should also document how users can run their 
> own local ASAN builds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1923) [C++] Make easier to use const ChunkedArray& with Datum in computation context

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1923:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] Make easier to use const ChunkedArray& with Datum in computation context
> --
>
> Key: ARROW-1923
> URL: https://issues.apache.org/jira/browse/ARROW-1923
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> Currently this only accepts a {{shared_ptr}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1896) [C++] Do not allocate memory for primitive outputs in CastKernel::Call implementation

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1896:
---

Assignee: (was: Wes McKinney)

> [C++] Do not allocate memory for primitive outputs in CastKernel::Call 
> implementation
> -
>
> Key: ARROW-1896
> URL: https://issues.apache.org/jira/browse/ARROW-1896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> This is some refactoring / tidying. Unless an output of cast has a 
> non-determinate size (e.g. is Binary or something else), the 
> {{CastKernel::Call}} implementation should assume that it is writing into 
> pre-allocated memory. The corresponding memory allocation can be lifted into 
> the {{arrow::compute::Cast}} API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1789) [Format] Consolidate specification documents and improve clarity for new implementation authors

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1789:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Format] Consolidate specification documents and improve clarity for new 
> implementation authors
> ---
>
> Key: ARROW-1789
> URL: https://issues.apache.org/jira/browse/ARROW-1789
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> See discussion in https://github.com/apache/arrow/issues/1296
> I believe the specification documents Layout.md, Metadata.md, and IPC.md 
> would benefit from being consolidated into a single Markdown document that 
> would be sufficient (along with the Flatbuffers schemas) to create a complete 
> Arrow implementation capable of reading and writing the binary format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1786) [Format] List expected on-wire buffer layouts for each kind of Arrow physical type in specification

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1786:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Format] List expected on-wire buffer layouts for each kind of Arrow physical 
> type in specification
> ---
>
> Key: ARROW-1786
> URL: https://issues.apache.org/jira/browse/ARROW-1786
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> see ARROW-1693, ARROW-1785



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1896) [C++] Do not allocate memory for primitive outputs in CastKernel::Call implementation

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1896:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] Do not allocate memory for primitive outputs in CastKernel::Call 
> implementation
> -
>
> Key: ARROW-1896
> URL: https://issues.apache.org/jira/browse/ARROW-1896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> This is some refactoring / tidying. Unless an output of cast has a 
> non-determinate size (e.g. is Binary or something else), the 
> {{CastKernel::Call}} implementation should assume that it is writing into 
> pre-allocated memory. The corresponding memory allocation can be lifted into 
> the {{arrow::compute::Cast}} API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1807) [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1807:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers
> ---
>
> Key: ARROW-1807
> URL: https://issues.apache.org/jira/browse/ARROW-1807
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.11.0
>
>
> Consolidate buffers for reducing the volume of objects and heap usage
>  => single buffer for fixed width
> < validity + offsets> = single buffer for var width, list vector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2113) [Python] Incomplete CLASSPATH with "hadoop" contained in it can fool the classpath setting HDFS logic

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2113:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Incomplete CLASSPATH with "hadoop" contained in it can fool the 
> classpath setting HDFS logic
> -
>
> Key: ARROW-2113
> URL: https://issues.apache.org/jira/browse/ARROW-2113
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Linux Redhat 7.4, Anaconda 4.4.7, Python 2.7.12, CDH 
> 5.13.1
>Reporter: Michal Danko
>Priority: Major
> Fix For: 0.11.0
>
>
> Steps to replicate the issue:
> mkdir /tmp/test
>  cd /tmp/test
>  mkdir jars
>  cd jars
>  touch test1.jar
>  mkdir -p ../lib/zookeeper
>  cd ../lib/zookeeper
>  ln -s ../../jars/test1.jar ./test1.jar
>  ln -s test1.jar test.jar
>  mkdir -p ../hadoop/lib
>  cd ../hadoop/lib
>  ln -s ../../../lib/zookeeper/test.jar ./test.jar
> (this part depends on your configuration you need those values for 
> pyarrow.hdfs to work: )
> (path to libjvm: )
> (export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera)
> (path to libhdfs: )
> (export 
> LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib64/)
> export CLASSPATH="/tmp/test/lib/hadoop/lib/test.jar"
> python
>  import pyarrow.hdfs as hdfs;
>  fs = hdfs.connect(user="hdfs")
>  
> Ends with error:
> 
>  loadFileSystems error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, 
> kerbTicketCachePath=(NULL), userName=pa) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  Traceback (most recent call last): (
>  File "", line 1, in 
>  File "/opt/pa/anaconda2/lib/python2.7/site-packages/pyarrow/hdfs.py", line 
> 170, in connect
>  kerb_ticket=kerb_ticket, driver=driver)
>  File "/opt/pa/anaconda2/lib/python2.7/site-packages/pyarrow/hdfs.py", line 
> 37, in __init__
>  self._connect(host, port, user, kerb_ticket, driver)
>  File "pyarrow/io-hdfs.pxi", line 87, in 
> pyarrow.lib.HadoopFileSystem._connect 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:61673)
>  File "pyarrow/error.pxi", line 79, in pyarrow.lib.check_status 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:8345)
>  pyarrow.lib.ArrowIOError: HDFS connection failed
>  -
>  
> export CLASSPATH="/tmp/test/lib/zookeeper/test.jar"
>  python
>  import pyarrow.hdfs as hdfs;
>  fs = hdfs.connect(user="hdfs")
>  
> Works properly.
>  
> I can't find reason why first CLASSPATH doesn't work and second one does, 
> because it's path to same .jar, just with extra symlink in it. To me, it 
> looks like pyarrow.lib.check has problem with symlinks defined with many 
> ../.../.. .
> I would expect that pyarrow would work with any definition of path to .jar
> Please notice that path are not generated at random, it is path copied from 
> Cloudera distribution of Hadoop (original file was zookeeper.jar),
> Because of this issue, our customer currently can't use pyarrow lib for oozie 
> workflows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2041) [Python] pyarrow.serialize has high overhead for list of NumPy arrays

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2041:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] pyarrow.serialize has high overhead for list of NumPy arrays
> -
>
> Key: ARROW-2041
> URL: https://issues.apache.org/jira/browse/ARROW-2041
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Richard Shin
>Priority: Major
> Fix For: 0.11.0
>
>
> {{Python 2.7.12 (default, Nov 20 2017, 18:23:56)}}
> {{[GCC 5.4.0 20160609] on linux2}}
> {{Type "help", "copyright", "credits" or "license" for more information.}}
> {{>>> import pyarrow as pa, numpy as np}}
> {{>>> arrays = [np.arange(100, dtype=np.int32) for _ in range(1)]}}
> {{>>> with open('test.pyarrow', 'w') as f:}}
> {{... f.write(pa.serialize(arrays).to_buffer().to_pybytes())}}
> {{...}}
> {{>>> import cPickle as pickle}}
> {{>>> pickle.dump(arrays, open('test.pkl', 'w'), pickle.HIGHEST_PROTOCOL)}}
> test.pyarrow is 6.2 MB, while test.pkl is only 4.2 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2059) [Python] Possible performance regression in Feather read/write path

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2059:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Possible performance regression in Feather read/write path
> ---
>
> Key: ARROW-2059
> URL: https://issues.apache.org/jira/browse/ARROW-2059
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Jingyuan Wang
>Priority: Major
> Fix For: 0.11.0
>
>
> See discussion in https://github.com/wesm/feather/issues/329. Needs to be 
> investigated



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1989:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Better UX on timestamp conversion to Pandas
> 
>
> Key: ARROW-1989
> URL: https://issues.apache.org/jira/browse/ARROW-1989
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that 
> they have dates that are larger than Pandas can represent with their 
> nanosecond representation. Currently they simply see an Arrow exception and 
> think that this problem is caused by Arrow. We should try to change the error 
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> XX. This conversion is needed as Pandas does only support nanosecond 
> timestamps. Your data is likely out of the range that can be represented with 
> nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2338) [Scripts] Windows release verification script should create a conda environment

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527866#comment-16527866
 ] 

Wes McKinney commented on ARROW-2338:
-

[~cpcloud] would you have time to get to this before 0.10.0?

> [Scripts] Windows release verification script should create a conda 
> environment
> ---
>
> Key: ARROW-2338
> URL: https://issues.apache.org/jira/browse/ARROW-2338
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
> Fix For: 0.10.0
>
>
> It should also download the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2337) [Scripts] Windows release verification script should use boost DSOs instead of static linkage

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2337:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Scripts] Windows release verification script should use boost DSOs instead 
> of static linkage
> -
>
> Key: ARROW-2337
> URL: https://issues.apache.org/jira/browse/ARROW-2337
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
> Fix For: 0.11.0
>
>
> Fix up shortly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2300) [Python] python/testing/test_hdfs.sh no longer works

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527863#comment-16527863
 ] 

Wes McKinney commented on ARROW-2300:
-

I made this a blocker; developers need a reliable way to validate that the HDFS 
integration works

> [Python] python/testing/test_hdfs.sh no longer works
> 
>
> Key: ARROW-2300
> URL: https://issues.apache.org/jira/browse/ARROW-2300
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tried this on a fresh Ubuntu 16.04 install:
> {code}
> $ ./test_hdfs.sh 
> + docker build -t arrow-hdfs-test -f hdfs/Dockerfile .
> Sending build context to Docker daemon  36.86kB
> Step 1/6 : FROM cpcloud86/impala:metastore
> manifest for cpcloud86/impala:metastore not found
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2248:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Nightly or on-demand HDFS test builds
> --
>
> Key: ARROW-2248
> URL: https://issues.apache.org/jira/browse/ARROW-2248
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> We continue to acquire more functionality related to HDFS and Parquet. 
> Testing this, including tests that involve interoperability with other 
> systems, like Spark, will require some work outside of our normal CI 
> infrastructure.
> I suggest we start with testing the C++/Python HDFS integration, which will 
> help with validating patches like ARROW-1643 
> https://github.com/apache/arrow/pull/1668



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2300) [Python] python/testing/test_hdfs.sh no longer works

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2300:

Priority: Blocker  (was: Major)

> [Python] python/testing/test_hdfs.sh no longer works
> 
>
> Key: ARROW-2300
> URL: https://issues.apache.org/jira/browse/ARROW-2300
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tried this on a fresh Ubuntu 16.04 install:
> {code}
> $ ./test_hdfs.sh 
> + docker build -t arrow-hdfs-test -f hdfs/Dockerfile .
> Sending build context to Docker daemon  36.86kB
> Step 1/6 : FROM cpcloud86/impala:metastore
> manifest for cpcloud86/impala:metastore not found
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2383) [C++] Debian packages need to depend on libprotobuf

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527867#comment-16527867
 ] 

Wes McKinney commented on ARROW-2383:
-

[~kou] now that we have {{ARROW_PROTOBUF_USE_SHARED}} this should be easy to 
fix, or maybe already fixed?

> [C++] Debian packages need to depend on libprotobuf
> ---
>
> Key: ARROW-2383
> URL: https://issues.apache.org/jira/browse/ARROW-2383
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> It seems that we are currently building protobuf using the ExternalProject 
> facility in the debian packages and thus conflict with the system provided 
> protobuf libraries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2520) [Rust] CI should also build against nightly Rust

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2520:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] CI should also build against nightly Rust
> 
>
> Key: ARROW-2520
> URL: https://issues.apache.org/jira/browse/ARROW-2520
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Minor
> Fix For: 0.11.0
>
>
> We should build Arrow against Rust nightly, but allow failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2439) [Rust] Run license header checks also in Rust CI entry

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2439.
---

> [Rust] Run license header checks also in Rust CI entry
> --
>
> Key: ARROW-2439
> URL: https://issues.apache.org/jira/browse/ARROW-2439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> Currently we only audit license headers in the C++ builds. We should also do 
> this in the Rust Travis entry. The overhead for them is so minimal that we 
> can do it twice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2439) [Rust] Run license header checks also in Rust CI entry

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2439.
-
Resolution: Not A Problem

The license audit is always run, even if the C++ build does not proceed 
https://github.com/apache/arrow/blob/master/.travis.yml#L64

> [Rust] Run license header checks also in Rust CI entry
> --
>
> Key: ARROW-2439
> URL: https://issues.apache.org/jira/browse/ARROW-2439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> Currently we only audit license headers in the C++ builds. We should also do 
> this in the Rust Travis entry. The overhead for them is so minimal that we 
> can do it twice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2734) [Python] Cython api example doesn't work by default on macOS

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2734:

Summary: [Python] Cython api example doesn't work by default on macOS  
(was: Cython api example doesn't work by default on macOS)

> [Python] Cython api example doesn't work by default on macOS
> 
>
> Key: ARROW-2734
> URL: https://issues.apache.org/jira/browse/ARROW-2734
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Affects Versions: 0.9.0
> Environment: macOS 10.13
>Reporter: Jonathan Chambers
>Priority: Minor
> Fix For: 0.10.0
>
>
> The setup.py + example.pyx given in the docs:
> [https://arrow.apache.org/docs/python/extending.html#example]
> doesn't work on macOS.
>  
> The first issue is the error:
> *example.cpp:603:10:* *fatal error:* *'unordered_map' file not found*
> because (AFAIU) macOS clang doesn't include the required C++11 lib by default.
> This can be solved by adding: 
> {code:java}
> os.environ['CFLAGS'] = '-std=c++11 -stdlib=libc++'
> {code}
> to setup.py
>  
> The second issue is that the line
> {code:java}
> ext.library_dirs.append(pa.get_library_dirs())
> {code}
> should be  
> {code:java}
> ext.library_dirs.extend(pa.get_library_dirs())
> {code}
>  
> otherwise this causes a (completely uninformative) typerror during the build 
> because library dirs ends up being a list of list instead of a list of string.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2619) [Rust] Move JSON serde code to separate file/module

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2619:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Move JSON serde code to separate file/module
> ---
>
> Key: ARROW-2619
> URL: https://issues.apache.org/jira/browse/ARROW-2619
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2460) [Rust] Schema and DataType::Struct should use Vec>

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2460:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Rust] Schema and DataType::Struct should use Vec>
> 
>
> Key: ARROW-2460
> URL: https://issues.apache.org/jira/browse/ARROW-2460
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andy Grove
>Priority: Minor
> Fix For: 0.11.0
>
>
> Currently we use Vec instead of Vec> which is resulting in 
> having to clone fields in some use cases, which could be expensive for 
> structs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2763) [Python] Make parquet _metadata file accessible from ParquetDataset

2018-06-29 Thread Robert Gruener (JIRA)
Robert Gruener created ARROW-2763:
-

 Summary: [Python] Make parquet _metadata file accessible from 
ParquetDataset
 Key: ARROW-2763
 URL: https://issues.apache.org/jira/browse/ARROW-2763
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Robert Gruener


Currently when creating a ParquetDataset it gives you access to the 
_common_metadata file but not the _metadata file.

We access the metadata file to get row group information of the dataset without 
opening each footer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-549) [C++] Add function to concatenate like-typed arrays

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-549:
---
Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++] Add function to concatenate like-typed arrays
> ---
>
> Key: ARROW-549
> URL: https://issues.apache.org/jira/browse/ARROW-549
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Panchen Xue
>Priority: Major
>  Labels: Analytics
> Fix For: 0.11.0
>
>
> A la 
> {{Status arrow::Concatenate(const std::vector>& 
> arrays, MemoryPool* pool, std::shared_ptr* out)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-352) [Format] Interval(DAY_TIME) has no unit

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-352:
---
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Format] Interval(DAY_TIME) has no unit
> ---
>
> Key: ARROW-352
> URL: https://issues.apache.org/jira/browse/ARROW-352
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Wes McKinney
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
>
> Interval(DATE_TIME) assumes milliseconds.
> we should have a time unit like timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-352) [Format] Interval(DAY_TIME) has no unit

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-352:
---
Labels: columnar-format-1.0  (was: )

> [Format] Interval(DAY_TIME) has no unit
> ---
>
> Key: ARROW-352
> URL: https://issues.apache.org/jira/browse/ARROW-352
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Wes McKinney
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
>
> Interval(DATE_TIME) assumes milliseconds.
> we should have a time unit like timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2722) [Python] ndarray to arrow conversion fails when downcasted from pandas to_numeric

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527683#comment-16527683
 ] 

Antoine Pitrou commented on ARROW-2722:
---

Augusto, is it possible for you to try on git master? It may (or should) have 
been fixed by ARROW-2135.

> [Python] ndarray to arrow conversion fails when downcasted from pandas 
> to_numeric
> -
>
> Key: ARROW-2722
> URL: https://issues.apache.org/jira/browse/ARROW-2722
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
> Environment: Windows 10 64-bit
>Reporter: Augusto Radtke
>Priority: Major
> Fix For: 0.10.0
>
>
> The following snippet:
> {code:java}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> pa.array(pd.to_numeric(pd.Series(np.array([65536,2,3], dtype=np.uint64)), 
> downcast='unsigned'), 
> from_pandas=True, type='uint32')
> {code}
> fails to convert with message:
> {noformat}
> ArrowNotImplementedError Traceback (most recent call last)
>  in ()
> 4 
> 5 pa.array(pd.to_numeric(pd.Series(np.array([65536,2,3], dtype=np.uint64)), 
> downcast='unsigned'), 
> > 6 from_pandas=True, type='uint32')
> array.pxi in pyarrow.lib.array()
> array.pxi in pyarrow.lib._ndarray_to_array()
> error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: Unsupported numpy type 6{noformat}
>  
> This is a Windows 64-bit machine, running Python 3.6.5, pyarrow 0.9.0, pandas 
> 0.23.1 and numpy 1.14.5.
> Seems to be fine for uint16 or uint8 downcasting. Unfortunately I didn't had 
> the time to dig deeper or try on a Linux machine but it feels like its 
> related to the LLP64 model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2654) [Python] Error with errno 22 when loading 3.6 GB Parquet file

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527687#comment-16527687
 ] 

Antoine Pitrou commented on ARROW-2654:
---

Does it work if you make the file smaller than 2GB? This might be a 64-bitness 
issue (which may actually be fixed in git master).

> [Python] Error with errno 22 when loading 3.6 GB Parquet file
> -
>
> Key: ARROW-2654
> URL: https://issues.apache.org/jira/browse/ARROW-2654
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Andy Reagan
>Priority: Major
> Fix For: 0.10.0
>
>
> I saved a file using pandas to_parquet method, but can't read it back in. 
> Here's the full stack trace:
>  
> {code:java}
> Traceback (most recent call last):
> File "src/data/CLXP_pull.py", line 214, in 
>  main()
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 722, in _call_
>  return self.main(*args, **kwargs)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 697, in main
>  rv = self.invoke(ctx)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 895, in invoke
>  return ctx.invoke(self.callback, **ctx.params)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 535, in invoke
>  return callback(*args, **kwargs)
>  File "src/data/CLXP_pull.py", line 188, in main
>  results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", 
> fullname+".parquet"), engine="pyarrow")
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
>  line 257, in read_parquet
>  return impl.read(path, columns=columns, **kwargs)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
>  line 130, in read
>  **kwargs).to_pandas()
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
>  line 939, in read_table
>  pf = ParquetFile(source, metadata=metadata)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
>  line 64, in _init_
>  self.reader.open(source, metadata=metadata)
>  File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
>  File "error.pxi", line 79, in pyarrow.lib.check_status
>  pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
> {code}
> Any ideas what could cause this? The file itself is 3.6GB.
> I'm running pandas==0.22.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2646) [Python] Pandas roundtrip for date objects

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527728#comment-16527728
 ] 

Antoine Pitrou commented on ARROW-2646:
---

The standard {{csv}} module has both a notion of "dialect" and addition 
{{**kwargs}} to each function to that you can override individual options. 
Intuitively, it allows accepting individual option arguments without listing 
and documenting them explicitly for each method.

> [Python] Pandas roundtrip for date objects
> --
>
> Key: ARROW-2646
> URL: https://issues.apache.org/jira/browse/ARROW-2646
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.10.0
>
>
> Arrow currently casts date objects to nanosecond precision datetime objects. 
> I'd like to have a way to preserve the type during a roundtrip
> {code}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import datetime
> >>> pa.date32().to_pandas_dtype()
> dtype(' >>> df = pd.DataFrame({'date': [datetime.date(2018, 1, 1)]})
> >>> df.dtypes
> date object
> dtype: object
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas()
> >>> df_rountrip.dtypes
> datedatetime64[ns]
> dtype: object
> {code}
> I'd expect something like this to work:
> {code}
> >>> import pandas.testing as pdt
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas(date_as_object=True)
> >>> pdt.assert_frame_equal(df_rountrip, df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2646) [Python] Pandas roundtrip for date objects

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2646:
--
Affects Version/s: 0.9.0

> [Python] Pandas roundtrip for date objects
> --
>
> Key: ARROW-2646
> URL: https://issues.apache.org/jira/browse/ARROW-2646
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.10.0
>
>
> Arrow currently casts date objects to nanosecond precision datetime objects. 
> I'd like to have a way to preserve the type during a roundtrip
> {code}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import datetime
> >>> pa.date32().to_pandas_dtype()
> dtype(' >>> df = pd.DataFrame({'date': [datetime.date(2018, 1, 1)]})
> >>> df.dtypes
> date object
> dtype: object
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas()
> >>> df_rountrip.dtypes
> datedatetime64[ns]
> dtype: object
> {code}
> I'd expect something like this to work:
> {code}
> >>> import pandas.testing as pdt
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas(date_as_object=True)
> >>> pdt.assert_frame_equal(df_rountrip, df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2646) [Python] Pandas roundtrip for date objects

2018-06-29 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527728#comment-16527728
 ] 

Antoine Pitrou edited comment on ARROW-2646 at 6/29/18 2:36 PM:


The standard {{csv}} module has both a notion of "dialect" and additional 
{{**kwargs}} to each function so that you can override individual options. 
Intuitively, it allows accepting individual option arguments without listing 
and documenting them explicitly for each method.

I tend to prefer the options object / dialect approach myself, but it's true 
I'm more in the library developer camp :-)


was (Author: pitrou):
The standard {{csv}} module has both a notion of "dialect" and addition 
{{**kwargs}} to each function to that you can override individual options. 
Intuitively, it allows accepting individual option arguments without listing 
and documenting them explicitly for each method.

> [Python] Pandas roundtrip for date objects
> --
>
> Key: ARROW-2646
> URL: https://issues.apache.org/jira/browse/ARROW-2646
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.10.0
>
>
> Arrow currently casts date objects to nanosecond precision datetime objects. 
> I'd like to have a way to preserve the type during a roundtrip
> {code}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import datetime
> >>> pa.date32().to_pandas_dtype()
> dtype(' >>> df = pd.DataFrame({'date': [datetime.date(2018, 1, 1)]})
> >>> df.dtypes
> date object
> dtype: object
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas()
> >>> df_rountrip.dtypes
> datedatetime64[ns]
> dtype: object
> {code}
> I'd expect something like this to work:
> {code}
> >>> import pandas.testing as pdt
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas(date_as_object=True)
> >>> pdt.assert_frame_equal(df_rountrip, df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2646) [Python] Pandas roundtrip for date objects

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2646:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Pandas roundtrip for date objects
> --
>
> Key: ARROW-2646
> URL: https://issues.apache.org/jira/browse/ARROW-2646
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.11.0
>
>
> Arrow currently casts date objects to nanosecond precision datetime objects. 
> I'd like to have a way to preserve the type during a roundtrip
> {code}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import datetime
> >>> pa.date32().to_pandas_dtype()
> dtype(' >>> df = pd.DataFrame({'date': [datetime.date(2018, 1, 1)]})
> >>> df.dtypes
> date object
> dtype: object
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas()
> >>> df_rountrip.dtypes
> datedatetime64[ns]
> dtype: object
> {code}
> I'd expect something like this to work:
> {code}
> >>> import pandas.testing as pdt
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas(date_as_object=True)
> >>> pdt.assert_frame_equal(df_rountrip, df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2607:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
> ---
>
> Key: ARROW-2607
> URL: https://issues.apache.org/jira/browse/ARROW-2607
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently 
> only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses 
> {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two 
> functions to be able to deal with string arrays. There is a currently failing 
> unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to 
> verify the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2609:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2609
> URL: https://issues.apache.org/jira/browse/ARROW-2609
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> The converter {{pyarrow.Field.from_jvm}} currently only works for primitive 
> types. Types like List, Struct or Union that have children in their 
> definition are not supported. We should add the needed recursion for these 
> types and enable the respective tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2610:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2610
> URL: https://issues.apache.org/jira/browse/ARROW-2610
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> The DictionaryType is a bit more complex as it also references the dictionary 
> values itself. This also needs to be integrated into 
> {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe 
> also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive 
> arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2535:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [C++/Python] Provide pre-commit hooks that check flake8 et al.
> --
>
> Key: ARROW-2535
> URL: https://issues.apache.org/jira/browse/ARROW-2535
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> We should provide pre-commit hooks that users can install (optionally) that 
> check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2366) [Python] Support reading Parquet files having a permutation of column order

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2366:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Support reading Parquet files having a permutation of column order
> ---
>
> Key: ARROW-2366
> URL: https://issues.apache.org/jira/browse/ARROW-2366
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> See discussion in https://github.com/dask/fastparquet/issues/320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2339) [Python] Add a fast path for int hashing

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2339:
--
Priority: Minor  (was: Major)

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Minor
> Fix For: 1.0.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2339) [Python] Add a fast path for int hashing

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2339:
--
Fix Version/s: (was: 0.10.0)
   1.0.0

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 1.0.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2237) [Python] Huge tables test failure

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2237:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Huge tables test failure
> -
>
> Key: ARROW-2237
> URL: https://issues.apache.org/jira/browse/ARROW-2237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.11.0
>
>
> This is a new failure here (Ubuntu 16.04, x86-64):
> {code}
> _ test_use_huge_pages 
> _
> Traceback (most recent call last):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 779, 
> in test_use_huge_pages
> create_object(plasma_client, 1)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 80, in 
> create_object
> seal=seal)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 69, in 
> create_object_with_id
> memory_buffer = client.create(object_id, data_size, metadata)
>   File "plasma.pyx", line 302, in pyarrow.plasma.PlasmaClient.create
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: /home/antoine/arrow/cpp/src/plasma/client.cc:192 
> code: PlasmaReceive(store_conn_, MessageType_PlasmaCreateReply, )
> /home/antoine/arrow/cpp/src/plasma/protocol.cc:46 code: ReadMessage(sock, 
> , buffer)
> Encountered unexpected EOF
>  Captured stderr call 
> -
> Allowing the Plasma store to use up to 0.1GB of memory.
> Starting object store with directory /mnt/hugepages and huge page support 
> enabled
> create_buffer failed to open file /mnt/hugepages/plasmapSNc0X
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-976) [Python] Provide API for defining and reading Parquet datasets with more ad hoc partition schemes

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-976:
---
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] Provide API for defining and reading Parquet datasets with more ad 
> hoc partition schemes
> -
>
> Key: ARROW-976
> URL: https://issues.apache.org/jira/browse/ARROW-976
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2553) [C++] Set MACOSX_DEPLOYMENT_TARGET in wheel build

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527745#comment-16527745
 ] 

Wes McKinney commented on ARROW-2553:
-

Should setting it to 10.9 be sufficient?

> [C++] Set MACOSX_DEPLOYMENT_TARGET in wheel build
> -
>
> Key: ARROW-2553
> URL: https://issues.apache.org/jira/browse/ARROW-2553
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.10.0
>
>
> The current `pyarrow` wheels are not usable on older OSX releases due to a 
> problem in the newest Xcode SDK. We need to set {{MACOSX_DEPLOYMENT_TARGET}} 
> to an older OSX release to avoid getting {{Symbol not found: 
> _os_unfair_lock_lock}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1638) [Java] IPC roundtrip for null type

2018-06-29 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527752#comment-16527752
 ] 

Wes McKinney commented on ARROW-1638:
-

I added a new label "columnar-format-1.0" to this

> [Java] IPC roundtrip for null type
> --
>
> Key: ARROW-1638
> URL: https://issues.apache.org/jira/browse/ARROW-1638
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Siddharth Teotia
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >