[jira] [Updated] (ARROW-2612) [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2612:
--
Labels: pull-request-available  (was: )

> [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
> 
>
> Key: ARROW-2612
> URL: https://issues.apache.org/jira/browse/ARROW-2612
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The deprecated PLASMA_DEFAULT_RELEASE_DELAY is currently broken, since it 
> refers to kDeprecatedPlasmaDefaultReleaseDelay without the plasma:: namespace 
> qualifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2612) [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

2018-05-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2612:
-

 Summary: [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
 Key: ARROW-2612
 URL: https://issues.apache.org/jira/browse/ARROW-2612
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


The deprecated PLASMA_DEFAULT_RELEASE_DELAY is currently broken, since it 
refers to kDeprecatedPlasmaDefaultReleaseDelay without the plasma:: namespace 
qualifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2611) [Python] Python 2 integer serialization

2018-05-17 Thread Philipp Moritz (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-2611.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 2055
[https://github.com/apache/arrow/pull/2055]

> [Python] Python 2 integer serialization
> ---
>
> Key: ARROW-2611
> URL: https://issues.apache.org/jira/browse/ARROW-2611
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Python 2, serializing a Python int with pyarrow.serialize and then 
> deserializing it returns a long instead of an integer. Note that this is not 
> an issue in python 3 where the long type does not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2611) [Python] Python 2 integer serialization

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2611:
--
Labels: pull-request-available  (was: )

> [Python] Python 2 integer serialization
> ---
>
> Key: ARROW-2611
> URL: https://issues.apache.org/jira/browse/ARROW-2611
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> In Python 2, serializing a Python int with pyarrow.serialize and then 
> deserializing it returns a long instead of an integer. Note that this is not 
> an issue in python 3 where the long type does not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2611) [Python] Python 2 integer serialization

2018-05-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2611:
-

 Summary: [Python] Python 2 integer serialization
 Key: ARROW-2611
 URL: https://issues.apache.org/jira/browse/ARROW-2611
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.9.0
Reporter: Philipp Moritz


In Python 2, serializing a Python int with pyarrow.serialize and then 
deserializing it returns a {{long }}instead of an integer. Note that this is 
not an issue in python 3 where the long type does not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2611) [Python] Python 2 integer serialization

2018-05-17 Thread Philipp Moritz (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-2611:
--
Description: In Python 2, serializing a Python int with pyarrow.serialize 
and then deserializing it returns a long instead of an integer. Note that this 
is not an issue in python 3 where the long type does not exist.  (was: In 
Python 2, serializing a Python int with pyarrow.serialize and then 
deserializing it returns a {{long }}instead of an integer. Note that this is 
not an issue in python 3 where the long type does not exist.)

> [Python] Python 2 integer serialization
> ---
>
> Key: ARROW-2611
> URL: https://issues.apache.org/jira/browse/ARROW-2611
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Priority: Major
>
> In Python 2, serializing a Python int with pyarrow.serialize and then 
> deserializing it returns a long instead of an integer. Note that this is not 
> an issue in python 3 where the long type does not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2592) [Python] AssertionError in to_pandas()

2018-05-17 Thread Dima Ryazanov (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479886#comment-16479886
 ] 

Dima Ryazanov commented on ARROW-2592:
--

Looks like writing with version 0.6 causes the problem - while 0.7 and later 
are fine.

Though even if the parquet file is broken, it should be some sort of a parse 
error, rather than an assert, right?

> [Python] AssertionError in to_pandas()
> --
>
> Key: ARROW-2592
> URL: https://issues.apache.org/jira/browse/ARROW-2592
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Dima Ryazanov
>Priority: Major
>
> Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have 
> (created using an older version of pyarrow). Repro steps:
> {{In [1]: from pyarrow.parquet import ParquetDataset}}
> {{In [2]: d = ParquetDataset(['bug.parq'])}}
> {{In [3]: t = d.read()}}
> {{In [4]: t.to_pandas()}}
> {{---}}
> {{AssertionError    Traceback (most recent call 
> last)}}
> {{ in ()}}
> {{> 1 t.to_pandas()}}
> {{table.pxi in pyarrow.lib.Table.to_pandas()}}
> {{~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in 
> table_to_blockmanager(options, table, memory_pool, nthreads, categories)}}
> {{    529 # There must be the same number of field names and physical 
> names}}
> {{    530 # (fields in the arrow Table)}}
> {{--> 531 assert len(logical_index_names) == len(index_columns_set)}}
> {{    532 }}
> {{    533 # It can never be the case in a released version of pyarrow 
> that}}
> {{AssertionError: }}
>  
> Here's the file: [https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq]
> (I was not able to attach it here due to a "missing token", whatever that 
> means.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2608) [Java/Python] Add pyarrow.{Array,Field}.from_jvm / jvm_buffer

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2608:
--
Labels: pull-request-available  (was: )

> [Java/Python] Add pyarrow.{Array,Field}.from_jvm / jvm_buffer
> -
>
> Key: ARROW-2608
> URL: https://issues.apache.org/jira/browse/ARROW-2608
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> As a first iteration of the Java->Python in-memory vector sharing, add a 
> minimal set of functions to access primitive arrays from Java in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2610:
--

 Summary: [Java/Python] Add support for dictionary type to 
pyarrow.Field.from_jvm
 Key: ARROW-2610
 URL: https://issues.apache.org/jira/browse/ARROW-2610
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Uwe L. Korn


The DictionaryType is a bit more complex as it also references the dictionary 
values itself. This also needs to be integrated into {{pyarrow.Field.from_jvm}} 
but the work to make DictionaryType working maybe also depends on that 
{{pyarrow.Array.from_jvm}} first supports non-primitive arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2609:
--

 Summary: [Java/Python] Complex type conversion in 
pyarrow.Field.from_jvm
 Key: ARROW-2609
 URL: https://issues.apache.org/jira/browse/ARROW-2609
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


The converter {{pyarrow.Field.from_jvm}} currently only works for primitive 
types. Types like List, Struct or Union that have children in their definition 
are not supported. We should add the needed recursion for these types and 
enable the respective tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2607:
--

 Summary: [Java/Python] Support VarCharVector / StringArray in 
pyarrow.Array.from_jvm
 Key: ARROW-2607
 URL: https://issues.apache.org/jira/browse/ARROW-2607
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently 
only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses 
{{pyarrow.Array.from_buffers}} underneath. We should extend one of the two 
functions to be able to deal with string arrays. There is a currently failing 
unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to verify 
the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2606) [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2606:
--

 Summary: [Java/Python]  Add unit test for pyarrow.decimal128 in 
Array.from_jvm
 Key: ARROW-2606
 URL: https://issues.apache.org/jira/browse/ARROW-2606
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to 
find the correct code to construct Java decimals and fill them into a 
{{DecimalVector}}. Afterwards, we should activate the decimal128 type on 
{{test_jvm_array}} and ensure that we load them correctly from Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2605:
--

 Summary: [Java/Python] Add unit test for pyarrow.timeX types in 
Array.from_jvm
 Key: ARROW-2605
 URL: https://issues.apache.org/jira/browse/ARROW-2605
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are 
missing the necessary methods to construct these arrays conveniently on the 
Python side.

Once there is a path to construct {{pyarrow.Array}} instances from a Python 
list of {{datetime.time}} for the various time types, we should activate the 
time types on {{test_jvm_array}} and ensure that we load them correctly from 
Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2604) [Java] Add method overload for VarCharVector.set(int,String)

2018-05-17 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2604:
---
Description: 
I would have expected that this is a very typical use case but at the moment I 
only see code that first needs to fill a {{VarCharHolder before passing it to 
VarCharVecttor.set()}}. We could also provide this as a convenience overload.

Correct me please if I missed a convenience feature. I'm still new to the Java 
side.

  was:
I would have expected that this is a very typical use case but at the moment I 
only see code that first fills a {{VarCharHolder}}. We could also provide this 
as a convenience overload.

Correct me please if I missed a convenience feature. I'm still new to the Java 
side.


> [Java] Add method overload for VarCharVector.set(int,String)
> 
>
> Key: ARROW-2604
> URL: https://issues.apache.org/jira/browse/ARROW-2604
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> I would have expected that this is a very typical use case but at the moment 
> I only see code that first needs to fill a {{VarCharHolder before passing it 
> to VarCharVecttor.set()}}. We could also provide this as a convenience 
> overload.
> Correct me please if I missed a convenience feature. I'm still new to the 
> Java side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2604) [Java] Add method overload for VarCharVector.set(int,String)

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2604:
--

 Summary: [Java] Add method overload for 
VarCharVector.set(int,String)
 Key: ARROW-2604
 URL: https://issues.apache.org/jira/browse/ARROW-2604
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors
Reporter: Uwe L. Korn
 Fix For: 0.10.0


I would have expected that this is a very typical use case but at the moment I 
only see code that first fills a {{VarCharHolder}}. We could also provide this 
as a convenience overload.

Correct me please if I missed a convenience feature. I'm still new to the Java 
side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2594) [Java] Vector reallocation does not properly clear reused buffers

2018-05-17 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-2594.
-
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 2054
[https://github.com/apache/arrow/pull/2054]

> [Java] Vector reallocation does not properly clear reused buffers
> -
>
> Key: ARROW-2594
> URL: https://issues.apache.org/jira/browse/ARROW-2594
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When reallocating a vector buffer, it assumes that the first half of the new 
> buffer was clean or populated from the previous and only zeros out the second 
> half.  This is not the case if the vector has released the buffer and the 
> current capacity is 0 (empty).  If the new buffer has values set, then they 
> will cause bogus values when used in the vector.
> I came across this when looking into SPARK-23030, due to the comment here 
> https://github.com/apache/spark/pull/21312#issuecomment-389035697



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2594) [Java] Vector reallocation does not properly clear reused buffers

2018-05-17 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-2594:

Summary: [Java] Vector reallocation does not properly clear reused buffers  
(was: Vector reallocation does not properly clear reused buffers)

> [Java] Vector reallocation does not properly clear reused buffers
> -
>
> Key: ARROW-2594
> URL: https://issues.apache.org/jira/browse/ARROW-2594
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When reallocating a vector buffer, it assumes that the first half of the new 
> buffer was clean or populated from the previous and only zeros out the second 
> half.  This is not the case if the vector has released the buffer and the 
> current capacity is 0 (empty).  If the new buffer has values set, then they 
> will cause bogus values when used in the vector.
> I came across this when looking into SPARK-23030, due to the comment here 
> https://github.com/apache/spark/pull/21312#issuecomment-389035697



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2521) [Rust] Refactor Rust API to use traits and generics

2018-05-17 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2521.

Resolution: Fixed

Issue resolved by pull request 1971
[https://github.com/apache/arrow/pull/1971]

> [Rust] Refactor Rust API to use traits and generics
> ---
>
> Key: ARROW-2521
> URL: https://issues.apache.org/jira/browse/ARROW-2521
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Early on, [~kszucs] and I worked on two different designs for how to 
> represent Arrow arrays in Rust, each with their pros and cons.
> Krisztian started out with a generics approach e.g. Array which was great 
> until we tried to implement structs, which can contain mixed types so we 
> ended up using enum to represent arrays, which was great until I got to the 
> list types ... I don't think I can implement nested lists with this approach.
> I am reviewing this again now that I am more familiar with Arrow and also my 
> Rust skills have improved greatly since I started working on all of this.
> I will be prototyping in a separate repo, and will update this Jira once I 
> have something concrete to share, but I feel it is important to address this 
> before the first official release of the Rust version. Also, if we are going 
> to consider a refactor like this, it is better to do it now while the 
> codebase is tiny.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2568) [Python] Expose thread pool size setting to Python, and deprecate "nthreads"

2018-05-17 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2568:
-

Assignee: Antoine Pitrou

> [Python] Expose thread pool size setting to Python, and deprecate "nthreads"
> 
>
> Key: ARROW-2568
> URL: https://issues.apache.org/jira/browse/ARROW-2568
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> Now that we have a global thread pool, we should:
>  * use it in places where we currently require an explicit number of threads 
> (with an additional {{use_threads}} argument to enable parallelism)
>  * deprecate the now pointless {{nthreads}} argument
>  * expose the thread pool capacity setting in Python



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2574) [CI] Collect and publish Python coverage

2018-05-17 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2574.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 2050
[https://github.com/apache/arrow/pull/2050]

> [CI] Collect and publish Python coverage
> 
>
> Key: ARROW-2574
> URL: https://issues.apache.org/jira/browse/ARROW-2574
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now that our Travis-CI setup is able to collect and publish C++ and Rust 
> coverage, we should do the same for Python and Cython modules in pyarrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2603) [Python] from pandas raises ArrowInvalid for date(time) subclasses

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2603:
--
Labels: pull-request-available  (was: )

> [Python] from pandas raises ArrowInvalid for date(time) subclasses
> --
>
> Key: ARROW-2603
> URL: https://issues.apache.org/jira/browse/ARROW-2603
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Assignee: Florian Jetter
>Priority: Minor
>  Labels: pull-request-available
>
> When converting a pandas dataframe holding subclasses of date/datetime 
> objects, arrow raises an {{ArrowInvalid}} exception
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import datetime
> classMyDate(datetime.date):
> pass
> date_array = [MyDate(2000, 1, 1)]
> df = pd.DataFrame({"date": pd.Series(date_array, dtype=object)})
> table = pa.Table.from_pandas(df){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2603) [Python] from pandas raises ArrowInvalid for date(time) subclasses

2018-05-17 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-2603:
-

 Summary: [Python] from pandas raises ArrowInvalid for date(time) 
subclasses
 Key: ARROW-2603
 URL: https://issues.apache.org/jira/browse/ARROW-2603
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Florian Jetter
Assignee: Florian Jetter


When converting a pandas dataframe holding subclasses of date/datetime objects, 
arrow raises an {{ArrowInvalid}} exception
{code:java}
import pandas as pd
import pyarrow as pa
import datetime

classMyDate(datetime.date):
pass

date_array = [MyDate(2000, 1, 1)]
df = pd.DataFrame({"date": pd.Series(date_array, dtype=object)})

table = pa.Table.from_pandas(df){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2601) [Python] MemoryPool bytes_allocated causes seg

2018-05-17 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479090#comment-16479090
 ] 

Antoine Pitrou commented on ARROW-2601:
---

Uh... Calling the {{MemoryPool}} constructor creates an invalid MemoryPool 
object. It's surprising that {{pa.array}} accepts it silently and doesn't crash.

I suppose {{MemoryPool.bytes_allocated()}} should check for the pointer being 
null and raise ValueError in that case. This is already done in other places 
(see e.g. {{Tensor._validate()}}).

> [Python] MemoryPool bytes_allocated causes seg
> --
>
> Key: ARROW-2601
> URL: https://issues.apache.org/jira/browse/ARROW-2601
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alex Hagerman
>Priority: Minor
> Fix For: 0.10.0
>
>
> Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 18:21:58) 
> [GCC 7.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa
> >>> mp = pa.MemoryPool()
> >>> arr = pa.array([1,2,3], memory_pool=mp)
> >>> mp.bytes_allocated()
> Segmentation fault (core dumped)
> I'll dig into this further, but should bytes_alloacted be returning anything 
> when called like this? Or should it raise NotImplemented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2602) [C++/Python] Automate build of development docker container

2018-05-17 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479086#comment-16479086
 ] 

Uwe L. Korn commented on ARROW-2602:


Actually as far as I understood, there should be infrastructure available that 
you only need to have a separate git-repository with a Dockerfile and commits 
to master of that repo will push a built docker image to 
[https://hub.docker.com/] . User then should be able to pull with {{docker pull 
apache/arrow-dev}}. You will only get the apache prefix though if it's an 
automated build through the docker hub infrastructure. 

For the {{manylinux1}} image we use at the 
[https://quay.io/repository/xhochy/arrow_manylinux1_x86_64_base] which is done 
via my arrow fork (I gave various PMC member write access to it but that's 
still suboptimal).

> [C++/Python] Automate build of development docker container
> ---
>
> Key: ARROW-2602
> URL: https://issues.apache.org/jira/browse/ARROW-2602
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> With 
> [https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089]
>  we provide a convenience docker container so that one can develop Arrow but 
> does not directly run into the hassles of setting up the development on chain 
> his machine.
> The current base image is not build automatically as we are waiting for input 
> from INFRA on https://issues.apache.org/jira/browse/INFRA-16533
> Once we know how to upload continously to docker hub, we should move the 
> Dockerfile appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2602) [C++/Python] Automate build of development docker container

2018-05-17 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479080#comment-16479080
 ] 

Krisztian Szucs commented on ARROW-2602:


[~xhochy] We could use the packaging automation for that too.

> [C++/Python] Automate build of development docker container
> ---
>
> Key: ARROW-2602
> URL: https://issues.apache.org/jira/browse/ARROW-2602
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> With 
> [https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089]
>  we provide a convenience docker container so that one can develop Arrow but 
> does not directly run into the hassles of setting up the development on chain 
> his machine.
> The current base image is not build automatically as we are waiting for input 
> from INFRA on https://issues.apache.org/jira/browse/INFRA-16533
> Once we know how to upload continously to docker hub, we should move the 
> Dockerfile appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2486) [C++/Python] Provide a Docker image that contains all dependencies for development

2018-05-17 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2486.

   Resolution: Fixed
Fix Version/s: (was: 0.11.0)
   0.10.0

Issue resolved by pull request 2016
[https://github.com/apache/arrow/pull/2016]

> [C++/Python] Provide a Docker image that contains all dependencies for 
> development
> --
>
> Key: ARROW-2486
> URL: https://issues.apache.org/jira/browse/ARROW-2486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Aneesh
>Priority: Major
>  Labels: hackathon, pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We should provide docker image and a docker file that contains all necessary 
> dependencies that one needs for development. In addition there should be a 
> Dockerfile that can be used for development where the sources are 
> (bind-)mounted into the container. A typical workflow should consist out of a 
> wrapper script that starts the container, takes care of the bind mounts and 
> runs cmake if necessary.
> People that want to get started with Arrow development on e.g. OS X will 
> spend a long time setting up the environment. I hope this lowers the barrier 
> for a first contribution a bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2602) [C++/Python] Automate build of development docker container

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2602:
--

 Summary: [C++/Python] Automate build of development docker 
container
 Key: ARROW-2602
 URL: https://issues.apache.org/jira/browse/ARROW-2602
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


With 
[https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089]
 we provide a convenience docker container so that one can develop Arrow but 
does not directly run into the hassles of setting up the development on chain 
his machine.

The current base image is not build automatically as we are waiting for input 
from INFRA on https://issues.apache.org/jira/browse/INFRA-16533

Once we know how to upload continously to docker hub, we should move the 
Dockerfile appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2600) [Python] Add additional LocalFileSystem filesystem methods

2018-05-17 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479027#comment-16479027
 ] 

Uwe L. Korn commented on ARROW-2600:


Also have a look at the recent discussion on the ML about the unification of 
filesystem interfaces: 
https://lists.apache.org/thread.html/10d3ea7fb3b9360fd5abcb7b68453621213d7e4ed9b351175e61b6a0@%3Cdev.arrow.apache.org%3E

> [Python] Add additional LocalFileSystem filesystem methods
> --
>
> Key: ARROW-2600
> URL: https://issues.apache.org/jira/browse/ARROW-2600
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Minor
> Fix For: 0.10.0
>
>
> Related to https://issues.apache.org/jira/browse/ARROW-1319 I noticed the 
> methods Martin listed are also not part of the LocalFileSystem class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2599) [Python] pip install is not working without Arrow C++ being installed

2018-05-17 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2599:
---
Summary: [Python] pip install is not working without Arrow C++ being 
installed  (was: [Python] pip install on ARM fails)

> [Python] pip install is not working without Arrow C++ being installed
> -
>
> Key: ARROW-2599
> URL: https://issues.apache.org/jira/browse/ARROW-2599
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Arch ARM Linux
> pip 10.0.1
> Python 3.6.5
>Reporter: Dominykas Mostauskis
>Priority: Minor
>
> Trying to install pyarrow with pip on ARM fails with `{{Could not find the 
> Arrow library. Looked for headers in , and for libs in}}`:
>  
> {{$ pip install pyarrow --no-build-isolation --user}}
> {{[omitted]}}
> {{Thread model: posix}}
> {{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}}
> {{ Selected compiler gcc 8.1.0}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}}
> {{ Configured for DEBUG build (set with cmake 
> -DCMAKE_BUILD_TYPE={release,debug}}
> {{,...})}}
> {{ -- Build Type: DEBUG}}
> {{ -- Build output directory: 
> /tmp/pip-install-auk894mc/pyarrow/build/temp.linu}}
> {{x-armv7l-3.6/debug/}}
> {{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}}
> {{ -- Found NumPy: version "1.14.3" 
> /home/domas/.local/lib/python3.6/site-packa}}
> {{ges/numpy/core/include}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}}
> {{ -- Checking for module 'arrow'}}
> {{ -- No package 'arrow' found}}
> {{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}}
> {{ Could not find the Arrow library. Looked for headers in , and for libs in}}
> {{ Call Stack (most recent call first):}}
> {{ CMakeLists.txt:197 (find_package)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2599) [Python] pip install on ARM fails

2018-05-17 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479002#comment-16479002
 ] 

Uwe L. Korn commented on ARROW-2599:


To have {{pip install pyarrow}} working from source builds, you need to have 
{{arrow-cpp}} installed on your system. {{pyarrow}} wheels only contain the 
Python sources.

> [Python] pip install on ARM fails
> -
>
> Key: ARROW-2599
> URL: https://issues.apache.org/jira/browse/ARROW-2599
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Arch ARM Linux
> pip 10.0.1
> Python 3.6.5
>Reporter: Dominykas Mostauskis
>Priority: Minor
>
> Trying to install pyarrow with pip on ARM fails with `{{Could not find the 
> Arrow library. Looked for headers in , and for libs in}}`:
>  
> {{$ pip install pyarrow --no-build-isolation --user}}
> {{[omitted]}}
> {{Thread model: posix}}
> {{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}}
> {{ Selected compiler gcc 8.1.0}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}}
> {{ Configured for DEBUG build (set with cmake 
> -DCMAKE_BUILD_TYPE={release,debug}}
> {{,...})}}
> {{ -- Build Type: DEBUG}}
> {{ -- Build output directory: 
> /tmp/pip-install-auk894mc/pyarrow/build/temp.linu}}
> {{x-armv7l-3.6/debug/}}
> {{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}}
> {{ -- Found NumPy: version "1.14.3" 
> /home/domas/.local/lib/python3.6/site-packa}}
> {{ges/numpy/core/include}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}}
> {{ -- Checking for module 'arrow'}}
> {{ -- No package 'arrow' found}}
> {{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}}
> {{ Could not find the Arrow library. Looked for headers in , and for libs in}}
> {{ Call Stack (most recent call first):}}
> {{ CMakeLists.txt:197 (find_package)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2599) [Python] pip install on ARM fails

2018-05-17 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2599:
---
Priority: Minor  (was: Blocker)

> [Python] pip install on ARM fails
> -
>
> Key: ARROW-2599
> URL: https://issues.apache.org/jira/browse/ARROW-2599
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Arch ARM Linux
> pip 10.0.1
> Python 3.6.5
>Reporter: Dominykas Mostauskis
>Priority: Minor
>
> Trying to install pyarrow with pip on ARM fails with `{{Could not find the 
> Arrow library. Looked for headers in , and for libs in}}`:
>  
> {{$ pip install pyarrow --no-build-isolation --user}}
> {{[omitted]}}
> {{Thread model: posix}}
> {{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}}
> {{ Selected compiler gcc 8.1.0}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}}
> {{ Configured for DEBUG build (set with cmake 
> -DCMAKE_BUILD_TYPE={release,debug}}
> {{,...})}}
> {{ -- Build Type: DEBUG}}
> {{ -- Build output directory: 
> /tmp/pip-install-auk894mc/pyarrow/build/temp.linu}}
> {{x-armv7l-3.6/debug/}}
> {{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}}
> {{ -- Found NumPy: version "1.14.3" 
> /home/domas/.local/lib/python3.6/site-packa}}
> {{ges/numpy/core/include}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}}
> {{ -- Checking for module 'arrow'}}
> {{ -- No package 'arrow' found}}
> {{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}}
> {{ Could not find the Arrow library. Looked for headers in , and for libs in}}
> {{ Call Stack (most recent call first):}}
> {{ CMakeLists.txt:197 (find_package)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2601) [Python] MemoryPool bytes_allocated causes seg

2018-05-17 Thread Alex Hagerman (JIRA)
Alex Hagerman created ARROW-2601:


 Summary: [Python] MemoryPool bytes_allocated causes seg
 Key: ARROW-2601
 URL: https://issues.apache.org/jira/browse/ARROW-2601
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Alex Hagerman
 Fix For: 0.10.0


Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 18:21:58) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.


>>> import pyarrow as pa

>>> mp = pa.MemoryPool()
>>> arr = pa.array([1,2,3], memory_pool=mp)
>>> mp.bytes_allocated()

Segmentation fault (core dumped)

I'll dig into this further, but should bytes_alloacted be returning anything 
when called like this? Or should it raise NotImplemented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2600) [Python] Add additional LocalFileSystem filesystem methods

2018-05-17 Thread Alex Hagerman (JIRA)
Alex Hagerman created ARROW-2600:


 Summary: [Python] Add additional LocalFileSystem filesystem methods
 Key: ARROW-2600
 URL: https://issues.apache.org/jira/browse/ARROW-2600
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Alex Hagerman
Assignee: Alex Hagerman
 Fix For: 0.10.0


Related to https://issues.apache.org/jira/browse/ARROW-1319 I noticed the 
methods Martin listed are also not part of the LocalFileSystem class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1644) [Python] Read and write nested Parquet data with a mix of struct and list nesting levels

2018-05-17 Thread Joshua Storck (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Storck reassigned ARROW-1644:


Assignee: Joshua Storck

> [Python] Read and write nested Parquet data with a mix of struct and list 
> nesting levels
> 
>
> Key: ARROW-1644
> URL: https://issues.apache.org/jira/browse/ARROW-1644
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: DB Tsai
>Assignee: Joshua Storck
>Priority: Major
> Fix For: 0.10.0
>
>
> We have many nested parquet files generated from Apache Spark for ranking 
> problems, and we would like to load them in python for other programs to 
> consume. 
> The schema looks like 
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- show_title_id: integer (nullable = true)
>  |||-- duration: double (nullable = true)
> {code}
> And when I tried to load it with nightly build pyarrow on Oct 4, 2017, I got 
> the following error.
> {code:python}
> Python 3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57) 
> [GCC 7.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import pyarrow.parquet as pq
> >>> table2 = pq.read_table('part-0')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", 
> line 823, in read_table
> use_pandas_metadata=use_pandas_metadata)
>   File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", 
> line 119, in read
> nthreads=nthreads)
>   File "_parquet.pyx", line 466, in pyarrow._parquet.ParquetReader.read_all
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: lists with structs are not supported.
> {code}
> I somehow get the impression that after 
> https://issues.apache.org/jira/browse/PARQUET-911 is merged, we should be 
> able to load the nested parquet in pyarrow. 
> Any insight about this? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2585) Add Decimal128::FromBigEndian

2018-05-17 Thread Joshua Storck (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Storck reassigned ARROW-2585:


Assignee: Joshua Storck

> Add Decimal128::FromBigEndian
> -
>
> Key: ARROW-2585
> URL: https://issues.apache.org/jira/browse/ARROW-2585
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joshua Storck
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This code is being moved from 
> https://github.com/apache/parquet-cpp/blob/8046481235e558344c3aa059c83ee86b9f67/src/parquet/arrow/reader.cc#L1049
>  for us in this PR: https://github.com/apache/parquet-cpp/pull/462



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1599) [Python] Unable to read Parquet files with list inside struct

2018-05-17 Thread Joshua Storck (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Storck reassigned ARROW-1599:


Assignee: Joshua Storck

> [Python] Unable to read Parquet files with list inside struct
> -
>
> Key: ARROW-1599
> URL: https://issues.apache.org/jira/browse/ARROW-1599
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
> Environment: Ubuntu
>Reporter: Jovann Kung
>Assignee: Joshua Storck
>Priority: Major
> Fix For: 0.10.0
>
>
> Is PyArrow currently unable to read in Parquet files with a vector as a 
> column? For example, the schema of such a file is below:
> {{
> mbc: FLOAT
> deltae: FLOAT
> labels: FLOAT
> features.type: INT32 INT_8
> features.size: INT32
> features.indices.list.element: INT32
> features.values.list.element: DOUBLE}}
> Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() 
> yields the following error: ArrowNotImplementedError: Currently only nesting 
> with Lists is supported.
> From the error I assume that this may be implemented in further releases?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2586) Make child builders of ListBuilder and StructBuilder shared_ptr's

2018-05-17 Thread Joshua Storck (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Storck reassigned ARROW-2586:


Assignee: Joshua Storck

> Make child builders of ListBuilder and StructBuilder shared_ptr's
> -
>
> Key: ARROW-2586
> URL: https://issues.apache.org/jira/browse/ARROW-2586
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joshua Storck
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is needed for changes in this PR that make it possible to deserialize 
> arbitrary nested structures in parquet (ARROW-1644): 
> https://github.com/apache/parquet-cpp/pull/462 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2599) [Python] pip install on ARM fails

2018-05-17 Thread Dominykas Mostauskis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominykas Mostauskis updated ARROW-2599:

Summary: [Python] pip install on ARM fails  (was: pip install on ARM fails)

> [Python] pip install on ARM fails
> -
>
> Key: ARROW-2599
> URL: https://issues.apache.org/jira/browse/ARROW-2599
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Arch ARM Linux
> pip 10.0.1
> Python 3.6.5
>Reporter: Dominykas Mostauskis
>Priority: Blocker
>
> Trying to install pyarrow with pip on ARM fails with `{{Could not find the 
> Arrow library. Looked for headers in , and for libs in}}`:
>  
> {{$ pip install pyarrow --no-build-isolation --user}}
> {{[omitted]}}
> {{Thread model: posix}}
> {{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}}
> {{ Selected compiler gcc 8.1.0}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3}}
> {{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC}}
> {{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}}
> {{ Configured for DEBUG build (set with cmake 
> -DCMAKE_BUILD_TYPE={release,debug}}
> {{,...})}}
> {{ -- Build Type: DEBUG}}
> {{ -- Build output directory: 
> /tmp/pip-install-auk894mc/pyarrow/build/temp.linu}}
> {{x-armv7l-3.6/debug/}}
> {{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}}
> {{ -- Found NumPy: version "1.14.3" 
> /home/domas/.local/lib/python3.6/site-packa}}
> {{ges/numpy/core/include}}
> {{ -- Searching for Python libs in 
> /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
> {{linux-gnueabihf}}
> {{ -- Looking for python3.6m}}
> {{ -- Found Python lib /usr/lib/libpython3.6m.so}}
> {{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}}
> {{ -- Checking for module 'arrow'}}
> {{ -- No package 'arrow' found}}
> {{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}}
> {{ Could not find the Arrow library. Looked for headers in , and for libs in}}
> {{ Call Stack (most recent call first):}}
> {{ CMakeLists.txt:197 (find_package)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2599) pip install on ARM fails

2018-05-17 Thread Dominykas Mostauskis (JIRA)
Dominykas Mostauskis created ARROW-2599:
---

 Summary: pip install on ARM fails
 Key: ARROW-2599
 URL: https://issues.apache.org/jira/browse/ARROW-2599
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Arch ARM Linux
pip 10.0.1
Python 3.6.5
Reporter: Dominykas Mostauskis


Trying to install pyarrow with pip on ARM fails with `{{Could not find the 
Arrow library. Looked for headers in , and for libs in}}`:

 

{{$ pip install pyarrow --no-build-isolation --user}}

{{[omitted]}}

{{Thread model: posix}}
{{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}}
{{ Selected compiler gcc 8.1.0}}
{{ -- Performing Test CXX_SUPPORTS_SSE3}}
{{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}}
{{ -- Performing Test CXX_SUPPORTS_ALTIVEC}}
{{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}}
{{ Configured for DEBUG build (set with cmake 
-DCMAKE_BUILD_TYPE={release,debug}}
{{,...})}}
{{ -- Build Type: DEBUG}}
{{ -- Build output directory: 
/tmp/pip-install-auk894mc/pyarrow/build/temp.linu}}
{{x-armv7l-3.6/debug/}}
{{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}}
{{ -- Searching for Python libs in 
/usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
{{linux-gnueabihf}}
{{ -- Looking for python3.6m}}
{{ -- Found Python lib /usr/lib/libpython3.6m.so}}
{{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}}
{{ -- Found NumPy: version "1.14.3" 
/home/domas/.local/lib/python3.6/site-packa}}
{{ges/numpy/core/include}}
{{ -- Searching for Python libs in 
/usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
{{linux-gnueabihf}}
{{ -- Looking for python3.6m}}
{{ -- Found Python lib /usr/lib/libpython3.6m.so}}
{{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}}
{{ -- Checking for module 'arrow'}}
{{ -- No package 'arrow' found}}
{{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}}
{{ Could not find the Arrow library. Looked for headers in , and for libs in}}
{{ Call Stack (most recent call first):}}
{{ CMakeLists.txt:197 (find_package)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2551) [Plasma] Improve notification logic

2018-05-17 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478859#comment-16478859
 ] 

Ovidiu Marcu edited comment on ARROW-2551 at 5/17/18 10:34 AM:
---

Thanks for #2028, it fixed a bug when the client crashed the second run will 
crash the plasma store (could change issue type from improvement to bug?).

I am using plasma notifications and I observe I do not get objects in order of 
their creation although I receive all objects.


was (Author: ovidiumarcu):
Thanks for #2028, it fixed a bug when the client crashed the second run will 
crash the plasma store (could change issue type from improvement to bug?).

 

I am using plasma notifications and I have some issues:

1) I do not get objects in order of their creation and

2) it seems some objects are somehow lost.

> [Plasma] Improve notification logic
> ---
>
> Key: ARROW-2551
> URL: https://issues.apache.org/jira/browse/ARROW-2551
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are a few issues in current Plasma notification code:
>  * When a client subscribes to Plasma, the store pushes notifications about 
> existing objects to ALL subscribers, while it should only push to the new 
> subscriber.
>  * And in the above scenario, it should only push "sealed" objects to the new 
> subscriber, while currently it pushes all objects regardless of the state.
>  * When a client disconnects, it will no longer be able to receive 
> notifications, thus the NotificationQueue for the client should be removed 
> from global map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2551) [Plasma] Improve notification logic

2018-05-17 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478859#comment-16478859
 ] 

Ovidiu Marcu commented on ARROW-2551:
-

Thanks for #2028, it fixed a bug when the client crashed the run will crash the 
plasma store (could change issue type from improvement to bug?).

 

I am using plasma notifications and I have some issues:

1) I do not get objects in order of their creation and

2) it seems some objects are somehow lost.

> [Plasma] Improve notification logic
> ---
>
> Key: ARROW-2551
> URL: https://issues.apache.org/jira/browse/ARROW-2551
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are a few issues in current Plasma notification code:
>  * When a client subscribes to Plasma, the store pushes notifications about 
> existing objects to ALL subscribers, while it should only push to the new 
> subscriber.
>  * And in the above scenario, it should only push "sealed" objects to the new 
> subscriber, while currently it pushes all objects regardless of the state.
>  * When a client disconnects, it will no longer be able to receive 
> notifications, thus the NotificationQueue for the client should be removed 
> from global map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2551) [Plasma] Improve notification logic

2018-05-17 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478859#comment-16478859
 ] 

Ovidiu Marcu edited comment on ARROW-2551 at 5/17/18 10:24 AM:
---

Thanks for #2028, it fixed a bug when the client crashed the second run will 
crash the plasma store (could change issue type from improvement to bug?).

 

I am using plasma notifications and I have some issues:

1) I do not get objects in order of their creation and

2) it seems some objects are somehow lost.


was (Author: ovidiumarcu):
Thanks for #2028, it fixed a bug when the client crashed the run will crash the 
plasma store (could change issue type from improvement to bug?).

 

I am using plasma notifications and I have some issues:

1) I do not get objects in order of their creation and

2) it seems some objects are somehow lost.

> [Plasma] Improve notification logic
> ---
>
> Key: ARROW-2551
> URL: https://issues.apache.org/jira/browse/ARROW-2551
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are a few issues in current Plasma notification code:
>  * When a client subscribes to Plasma, the store pushes notifications about 
> existing objects to ALL subscribers, while it should only push to the new 
> subscriber.
>  * And in the above scenario, it should only push "sealed" objects to the new 
> subscriber, while currently it pushes all objects regardless of the state.
>  * When a client disconnects, it will no longer be able to receive 
> notifications, thus the NotificationQueue for the client should be removed 
> from global map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2591) [Python] Segmentationfault issue in pq.write_table

2018-05-17 Thread jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jacques updated ARROW-2591:
---
Description: 
Context Is the following: I am currently dealing with sparse column 
serialization in parquet. In some cases, many lines are empty I can also have 
columns containing only empty lists.
However I got a segmentation fault when I try to write in parquet thoses 
columns filled only with empty lists.

Here is a simple code snipet reproduces the segmentation fault I had:


{noformat}

In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[],[]],pa.list_(pa.int32()))

In [4]: table = pa.Table.from_arrays([pa_ar],["test"])

In [5]: pq.write_table(
   ...: table=table,
   ...: where="test.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )
Segmentation fault

{noformat}

May I have it fixed?


Best

Jacques

  was:
Context Is the following: I am currently dealing with sparse column 
serialization in parquet. In some cases, many of this 
When trying this simple code snipet I got a segmentation fault


{noformat}

In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[],[]],pa.list_(pa.int32()))

In [4]: table = pa.Table.from_arrays([pa_ar],["test"])

In [5]: pq.write_table(
   ...: table=table,
   ...: where="test.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )
Segmentation fault

{noformat}

May I have it fixed.

 

Best

Jacques


> [Python] Segmentationfault issue in pq.write_table
> --
>
> Key: ARROW-2591
> URL: https://issues.apache.org/jira/browse/ARROW-2591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0, 0.9.0
>Reporter: jacques
>Priority: Major
>
> Context Is the following: I am currently dealing with sparse column 
> serialization in parquet. In some cases, many lines are empty I can also have 
> columns containing only empty lists.
> However I got a segmentation fault when I try to write in parquet thoses 
> columns filled only with empty lists.
> Here is a simple code snipet reproduces the segmentation fault I had:
> {noformat}
> In [1]: import pyarrow as pa
> In [2]: import pyarrow.parquet as pq
> In [3]: pa_ar = pa.array([[],[]],pa.list_(pa.int32()))
> In [4]: table = pa.Table.from_arrays([pa_ar],["test"])
> In [5]: pq.write_table(
>    ...: table=table,
>    ...: where="test.parquet",
>    ...: compression="snappy",
>    ...: flavor="spark"
>    ...: )
> Segmentation fault
> {noformat}
> May I have it fixed?
> Best
> Jacques



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2591) [Python] Segmentationfault issue in pq.write_table

2018-05-17 Thread jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jacques updated ARROW-2591:
---
Description: 
Context Is the following: I am currently dealing with sparse column 
serialization in parquet. In some cases, many of this 
When trying this simple code snipet I got a segmentation fault


{noformat}

In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[],[]],pa.list_(pa.int32()))

In [4]: table = pa.Table.from_arrays([pa_ar],["test"])

In [5]: pq.write_table(
   ...: table=table,
   ...: where="test.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )
Segmentation fault

{noformat}

May I have it fixed.

 

Best

Jacques

  was:
When trying this simple code snipet I got a segmentation fault

 

{noformat}

In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[],[]],pa.list_(pa.int32()))

In [4]: table = pa.Table.from_arrays([pa_ar],["test"])

In [5]: pq.write_table(
   ...: table=table,
   ...: where="test.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )
Segmentation fault

{noformat}

May I have it fixed.

 

Best

Jacques


> [Python] Segmentationfault issue in pq.write_table
> --
>
> Key: ARROW-2591
> URL: https://issues.apache.org/jira/browse/ARROW-2591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0, 0.9.0
>Reporter: jacques
>Priority: Major
>
> Context Is the following: I am currently dealing with sparse column 
> serialization in parquet. In some cases, many of this 
> When trying this simple code snipet I got a segmentation fault
> {noformat}
> In [1]: import pyarrow as pa
> In [2]: import pyarrow.parquet as pq
> In [3]: pa_ar = pa.array([[],[]],pa.list_(pa.int32()))
> In [4]: table = pa.Table.from_arrays([pa_ar],["test"])
> In [5]: pq.write_table(
>    ...: table=table,
>    ...: where="test.parquet",
>    ...: compression="snappy",
>    ...: flavor="spark"
>    ...: )
> Segmentation fault
> {noformat}
> May I have it fixed.
>  
> Best
> Jacques



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2597) [Plasma] remove UniqueIDHasher

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2597:
--
Labels: pull-request-available  (was: )

> [Plasma] remove UniqueIDHasher
> --
>
> Key: ARROW-2597
> URL: https://issues.apache.org/jira/browse/ARROW-2597
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2598) [Python] table.to_pandas segfault

2018-05-17 Thread jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jacques updated ARROW-2598:
---
Description: 
Here is a small snippet which produces a segfault:
{noformat}
In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[], []])

In [4]: pq.write_table(
   ...: table=pa.Table.from_arrays([pa_ar],["test"]),
   ...: where="test5.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )

In [5]: pq.read_table("test5.parquet")
Out[5]: 
pyarrow.Table
test: list
  child 0, item: null

In [6]: pq.read_table("test5.parquet").to_pydict()
Out[6]: OrderedDict([(u'test', [None, None])])

In [7]: pq.read_table("test5.parquet").to_pandas()
Segmentation fault

{noformat}

I thank you in advance for having this fixed.

Best, 

Jacques

  was:
Here is a small snippet which produce a segfault:

{noformat}

In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[], []])

In [4]: pq.write_table(
   ...: table=pa.Table.from_arrays([pa_ar],["test"]),
   ...: where="test5.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )

In [5]: pq.read_table("test5.parquet")
Out[5]: 
pyarrow.Table
test: list
  child 0, item: null

In [6]: pq.read_table("test5.parquet").to_pydict()
Out[6]: OrderedDict([(u'test', [None, None])])

In [7]: pq.read_table("test5.parquet").to_pandas()
Segmentation fault

 

{noformat}


> [Python]  table.to_pandas segfault
> --
>
> Key: ARROW-2598
> URL: https://issues.apache.org/jira/browse/ARROW-2598
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: jacques
>Priority: Major
>
> Here is a small snippet which produces a segfault:
> {noformat}
> In [1]: import pyarrow as pa
> In [2]: import pyarrow.parquet as pq
> In [3]: pa_ar = pa.array([[], []])
> In [4]: pq.write_table(
>    ...: table=pa.Table.from_arrays([pa_ar],["test"]),
>    ...: where="test5.parquet",
>    ...: compression="snappy",
>    ...: flavor="spark"
>    ...: )
> In [5]: pq.read_table("test5.parquet")
> Out[5]: 
> pyarrow.Table
> test: list
>   child 0, item: null
> In [6]: pq.read_table("test5.parquet").to_pydict()
> Out[6]: OrderedDict([(u'test', [None, None])])
> In [7]: pq.read_table("test5.parquet").to_pandas()
> Segmentation fault
> {noformat}
> I thank you in advance for having this fixed.
> Best, 
> Jacques



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2597) [Plasma] remove UniqueIDHasher

2018-05-17 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-2597:


 Summary: [Plasma] remove UniqueIDHasher
 Key: ARROW-2597
 URL: https://issues.apache.org/jira/browse/ARROW-2597
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Reporter: Zhijun Fu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2596) [GLib] Use the default value of GTK-Doc

2018-05-17 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2596:
---

 Summary: [GLib] Use the default value of GTK-Doc
 Key: ARROW-2596
 URL: https://issues.apache.org/jira/browse/ARROW-2596
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Affects Versions: 0.9.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-530) C++/Python: Provide subpools for better memory allocation tracking

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-530:
-
Labels: beginner newbie pull-request-available  (was: beginner newbie)

> C++/Python: Provide subpools for better memory allocation tracking
> --
>
> Key: ARROW-530
> URL: https://issues.apache.org/jira/browse/ARROW-530
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner, newbie, pull-request-available
>
> Currently we can only track the amount of bytes allocated by the main memory 
> pool or the alternative jemalloc implementation. To better understand certain 
> situation, we should provide a MemoryPool proxy implementation that tracks 
> only the amount of memory that was made through its direct calls but 
> delegates the actual allocation to an underlying pool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2593) [Python] TypeError: data type "mixed-integer" not understood

2018-05-17 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2593:
---
Fix Version/s: 0.10.0

> [Python] TypeError: data type "mixed-integer" not understood
> 
>
> Key: ARROW-2593
> URL: https://issues.apache.org/jira/browse/ARROW-2593
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Dima Ryazanov
>Priority: Major
> Fix For: 0.10.0
>
>
> Pyarrow 0.9 raises an exception when converting some tables to pandas 
> dataframes. Earlier versions work fine. Repro steps:
> {{In [1]: import pandas as pd}}
> {{In [2]: import pyarrow as pa}}
> {{In [3]: df = pd.DataFrame(\{'foo': [], 123: []})}}
> {{In [4]: table = pa.Table.from_pandas(df)}}
> {{In [5]: table.to_pandas()}}
> {{---}}
> {{KeyError  Traceback (most recent call 
> last)}}
> {{~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in 
> _pandas_type_to_numpy_type(pandas_type)}}
> {{    666 try:}}
> {{--> 667 return _pandas_logical_type_map[pandas_type]}}
> {{    668 except KeyError:}}
> {{KeyError: 'mixed-integer'}}
> (I ended up with a dataframe with mixed string/integer columns by using 
> pd.read_excel(..., skiprows=[0]) - which skipped the header, and treated the 
> first line of data as column names.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2592) [Python] AssertionError in to_pandas()

2018-05-17 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478609#comment-16478609
 ] 

Uwe L. Korn commented on ARROW-2592:


Do you still know with which version the file was written? We had a small range 
of commits between 0.7 and 0.8 that produced files that were later rejected by 
0.8 but those were never a part of a release.

> [Python] AssertionError in to_pandas()
> --
>
> Key: ARROW-2592
> URL: https://issues.apache.org/jira/browse/ARROW-2592
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Dima Ryazanov
>Priority: Major
>
> Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have 
> (created using an older version of pyarrow). Repro steps:
> {{In [1]: from pyarrow.parquet import ParquetDataset}}
> {{In [2]: d = ParquetDataset(['bug.parq'])}}
> {{In [3]: t = d.read()}}
> {{In [4]: t.to_pandas()}}
> {{---}}
> {{AssertionError    Traceback (most recent call 
> last)}}
> {{ in ()}}
> {{> 1 t.to_pandas()}}
> {{table.pxi in pyarrow.lib.Table.to_pandas()}}
> {{~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in 
> table_to_blockmanager(options, table, memory_pool, nthreads, categories)}}
> {{    529 # There must be the same number of field names and physical 
> names}}
> {{    530 # (fields in the arrow Table)}}
> {{--> 531 assert len(logical_index_names) == len(index_columns_set)}}
> {{    532 }}
> {{    533 # It can never be the case in a released version of pyarrow 
> that}}
> {{AssertionError: }}
>  
> Here's the file: [https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq]
> (I was not able to attach it here due to a "missing token", whatever that 
> means.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)