[jira] [Commented] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431703#comment-16431703
 ] 

ASF GitHub Bot commented on ARROW-2434:
---

andygrove commented on issue #1873: ARROW-2434: [Rust] Add windows support
URL: https://github.com/apache/arrow/pull/1873#issuecomment-379973257
 
 
   Hi @paddyhoran I tried to assign to you in JIRA but couldn't find your 
username on there. I think you need to create yourself a JIRA account first and 
then you should be able to self-assign.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431640#comment-16431640
 ] 

ASF GitHub Bot commented on ARROW-2434:
---

paddyhoran commented on issue #1873: ARROW-2434: [Rust] Add windows support
URL: https://github.com/apache/arrow/pull/1873#issuecomment-379958446
 
 
   ARROW-2436 will add CI for windows


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-2435:
-

 Summary: [Rust] Add memory pool abstraction.
 Key: ARROW-2435
 URL: https://issues.apache.org/jira/browse/ARROW-2435
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.9.0
Reporter: Renjie Liu


Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2434:
--
Labels: pull-request-available  (was: )

> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431621#comment-16431621
 ] 

ASF GitHub Bot commented on ARROW-2434:
---

paddyhoran opened a new pull request #1873: ARROW-2434: [Rust] Add windows 
support
URL: https://github.com/apache/arrow/pull/1873
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2434:
--

 Summary: [Rust] Add windows support
 Key: ARROW-2434
 URL: https://issues.apache.org/jira/browse/ARROW-2434
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
 Fix For: 0.10.0


Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431613#comment-16431613
 ] 

ASF GitHub Bot commented on ARROW-2423:
---

paddyhoran commented on issue #1871: ARROW-2423: [Rust] Add 
Builder.push_slice(&[T])
URL: https://github.com/apache/arrow/pull/1871#issuecomment-379952111
 
 
   @andygrove just noticed that the jira for this one is 2433 not 2423


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: pull-request-available
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2426:
--
Labels: pull-request-available  (was: )

> [CI] glib build failure
> ---
>
> Key: ARROW-2426
> URL: https://issues.apache.org/jira/browse/ARROW-2426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The glib build on Travis-CI fails:
> [https://travis-ci.org/apache/arrow/jobs/364123364#L6840]
> {code}
> ==> Installing gobject-introspection
> ==> Downloading 
> https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
> ==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
>   /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
> Installing gobject-introspection has failed!
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2433) [Rust] Add Builder.push_slice(&[T])

2018-04-09 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431510#comment-16431510
 ] 

Andy Grove commented on ARROW-2433:
---

PR: https://github.com/apache/arrow/pull/1871

> [Rust] Add Builder.push_slice(&[T])
> ---
>
> Key: ARROW-2433
> URL: https://issues.apache.org/jira/browse/ARROW-2433
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> When populating a Builder with Utf8 data it is more efficient to push 
> whole strings as &[u8] rather than one byte at a time.
> The same optimization works for all other types too.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431509#comment-16431509
 ] 

ASF GitHub Bot commented on ARROW-2423:
---

andygrove opened a new pull request #1871: ARROW-2423: [Rust] Add 
Builder.push_slice(&[T])
URL: https://github.com/apache/arrow/pull/1871
 
 
   This PR also fixes another instance of memory not being released.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: pull-request-available
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2423:
--
Labels: pull-request-available  (was: )

> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: pull-request-available
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2433) [Rust] Add Builder.push_slice(&[T])

2018-04-09 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2433:
-

 Summary: [Rust] Add Builder.push_slice(&[T])
 Key: ARROW-2433
 URL: https://issues.apache.org/jira/browse/ARROW-2433
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


When populating a Builder with Utf8 data it is more efficient to push whole 
strings as &[u8] rather than one byte at a time.

The same optimization works for all other types too.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431508#comment-16431508
 ] 

ASF GitHub Bot commented on ARROW-2387:
---

cpcloud commented on issue #1832: ARROW-2387: flip test for rescale loss if 
value < 0
URL: https://github.com/apache/arrow/pull/1832#issuecomment-379927866
 
 
   @bwo Looks like this is failing for unrelated reasons, can you rebase on top 
of master and push again? Then we can merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> negative decimal values get spurious rescaling error
> 
>
> Key: ARROW-2387
> URL: https://issues.apache.org/jira/browse/ARROW-2387
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: ben w
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> $ python
> Python 2.7.12 (default, Nov 20 2017, 18:23:56)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa, decimal
> >>> one = decimal.Decimal('1.00')
> >>> neg_one = decimal.Decimal('-1.00')
> >>> pa.array([one], pa.decimal128(24, 12))
> 
> [
> Decimal('1.')
> ]
> >>> pa.array([neg_one], pa.decimal128(24, 12))
> Traceback (most recent call last):
> File "", line 1, in 
> File "array.pxi", line 181, in pyarrow.lib.array
> File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "error.pxi", line 77, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from 
> original scale of 6 to new scale of 12 would cause data loss
> >>> pa.__version__
> '0.9.0'
> {code}
> not only is the error spurious, the decimal value has been multiplied by one 
> million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still 
> pretty strange to me).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431450#comment-16431450
 ] 

Phillip Cloud commented on ARROW-2432:
--

[~bryanc] Awesome, thanks.

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431438#comment-16431438
 ] 

Bryan Cutler commented on ARROW-2432:
-

It should be possible to share code paths when converting objects right?  I'd 
like to keep this with the minimum fix, lets look at possible refactoring 
after.  Thanks [~cpcloud], I already made the fix, just going to add tests.

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431398#comment-16431398
 ] 

Phillip Cloud commented on ARROW-2432:
--

[~pitrou] FWIW, the code conversion paths are not specific to decimal types and 
have been around since before decimals existed. [~bryanc] If you're not already 
working on this, then I can probably get it fixed up pretty quickly.

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431341#comment-16431341
 ] 

Bryan Cutler commented on ARROW-2432:
-

We really need to get the integration testing running regularly, or at least 
before a release

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431331#comment-16431331
 ] 

Antoine Pitrou commented on ARROW-2432:
---

Ow. For some reason it seems we have various code conversion paths depend on 
which API is called :-/

{code:python}
>>> data = [decimal.Decimal('3.14'), None]
>>> pa.array(data, type=pa.decimal128(12, 4))

[
  Decimal('3.1400'),
  NA
]
>>> pa.array(data, type=pa.decimal128(12, 4), from_pandas=True)

[
  Decimal('3.1400'),
  NA
]
>>> pa.Array.from_pandas(data, type=pa.decimal128(12, 4))

[
  Decimal('3.1400'),
  NA
]
>>> pa.Array.from_pandas(pd.Series(data), type=pa.decimal128(12, 4))
Traceback (most recent call last):
  File "", line 1, in 
pa.Array.from_pandas(pd.Series(data), type=pa.decimal128(12, 4))
  File "array.pxi", line 383, in pyarrow.lib.Array.from_pandas
  File "array.pxi", line 177, in pyarrow.lib.array
  File "error.pxi", line 77, in pyarrow.lib.check_status
  File "error.pxi", line 77, in pyarrow.lib.check_status
ArrowInvalid: /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1702 
code: converter.Convert()
Error converting from Python objects to Decimal: Got Python object of type 
NoneType but can only handle these types: decimal.Decimal

{code}

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-2432:

Description: 
Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal
{code}
The above error is raised when specifying decimal type. When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.

  was:
Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal

In [4]: s_dec
Out[4]:
0 3.14
1 None
dtype: object{code}

The above error is raised when specifying decimal type.  When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.


> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-2432:

Summary: [Python] from_pandas fails when converting decimals if have None 
values  (was: [Python] from_pandas fails when converting decimals if contain 
None)

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> In [4]: s_dec
> Out[4]:
> 0 3.14
> 1 None
> dtype: object{code}
> The above error is raised when specifying decimal type.  When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2432) [Python] from_pandas fails when converting decimals if contain None

2018-04-09 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-2432:
---

 Summary: [Python] from_pandas fails when converting decimals if 
contain None
 Key: ARROW-2432
 URL: https://issues.apache.org/jira/browse/ARROW-2432
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Bryan Cutler


Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal

In [4]: s_dec
Out[4]:
0 3.14
1 None
dtype: object{code}

The above error is raised when specifying decimal type.  When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if contain None

2018-04-09 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431276#comment-16431276
 ] 

Bryan Cutler commented on ARROW-2432:
-

I can work on this

> [Python] from_pandas fails when converting decimals if contain None
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> In [4]: s_dec
> Out[4]:
> 0 3.14
> 1 None
> dtype: object{code}
> The above error is raised when specifying decimal type.  When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1938) [Python] Error writing to partitioned Parquet dataset

2018-04-09 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud reassigned ARROW-1938:


Assignee: (was: Phillip Cloud)

> [Python] Error writing to partitioned Parquet dataset
> -
>
> Key: ARROW-1938
> URL: https://issues.apache.org/jira/browse/ARROW-1938
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Linux (Ubuntu 16.04)
>Reporter: Robert Dailey
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: ARROW-1938-test-data.csv.gz, ARROW-1938.py, 
> pyarrow_dataset_error.png
>
>
> I receive the following error after upgrading to pyarrow 0.8.0 when writing 
> to a dataset:
> * ArrowIOError: Column 3 had 187374 while previous column had 1
> The command was:
> write_table_values = {'row_group_size': 1}
> pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), 
> '/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 
> 'hour'], **write_table_values)
> I've also tried write_table_values = {'chunk_size': 1} and received the 
> same error.
> This same command works in version 0.7.1.  I am trying to troubleshoot the 
> problem but wanted to submit a ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431175#comment-16431175
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379883273
 
 
   My pleasure!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431167#comment-16431167
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379882116
 
 
   Thank you @kszucs !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 8:00 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- format commit message




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:59 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:59 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431128#comment-16431128
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou closed pull request #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/compute/kernels/cast.cc 
b/cpp/src/arrow/compute/kernels/cast.cc
index eaebd7cef..bfd519d18 100644
--- a/cpp/src/arrow/compute/kernels/cast.cc
+++ b/cpp/src/arrow/compute/kernels/cast.cc
@@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
-
 // Ensure that intraday milliseconds have been zeroed out
 auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
+
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
   }
-  out_data[i] -= remainder;
-  bit_reader.Next();
 }
   }
 };
diff --git a/python/pyarrow/tests/test_convert_pandas.py 
b/python/pyarrow/tests/test_convert_pandas.py
index c6e2b75be..de6120176 100644
--- a/python/pyarrow/tests/test_convert_pandas.py
+++ b/python/pyarrow/tests/test_convert_pandas.py
@@ -807,6 +807,44 @@ def test_datetime64_to_date32(self):
 
 assert arr2.equals(arr.cast('date32'))
 
+@pytest.mark.parametrize('mask', [
+None,
+np.ones(3),
+np.array([True, False, False]),
+])
+def test_pandas_datetime_to_date64(self, mask):
+s = pd.to_datetime([
+'2018-05-10T00:00:00',
+'2018-05-11T00:00:00',
+'2018-05-12T00:00:00',
+])
+arr = pa.Array.from_pandas(s, type=pa.date64(), mask=mask)
+
+data = np.array([
+date(2018, 5, 10),
+date(2018, 5, 11),
+date(2018, 5, 12)
+])
+expected = pa.array(data, mask=mask, type=pa.date64())
+
+assert arr.equals(expected)
+
+@pytest.mark.parametrize('mask', [
+None,
+np.ones(3),
+np.array([True, False, False])
+])
+def test_pandas_datetime_to_date64_failures(self, mask):
+s = pd.to_datetime([
+'2018-05-10T10:24:01',
+'2018-05-11T10:24:01',
+'2018-05-12T10:24:01',
+])
+
+expected_msg = 'Timestamp value had non-zero intraday milliseconds'
+with pytest.raises(pa.ArrowInvalid, msg=expected_msg):
+pa.Array.from_pandas(s, type=pa.date64(), mask=mask)
+
 def test_date_infer(self):
 df = pd.DataFrame({
 'date': [date(2000, 1, 1),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> 

[jira] [Resolved] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2391.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1859
[https://github.com/apache/arrow/pull/1859]

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431126#comment-16431126
 ] 

ASF GitHub Bot commented on ARROW-2430:
---

kszucs opened a new pull request #1869: ARROW-2430: [Packaging] MVP for branch 
based packaging automation
URL: https://github.com/apache/arrow/pull/1869
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2430:
--
Labels: pull-request-available  (was: )

> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:50 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs commented on ARROW-2430:


Additional TODOs:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:48 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers




was (Author: kszucs):
Additional TODOs:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2431) [Rust] Schema fidelity

2018-04-09 Thread Maximilian Roos (JIRA)
Maximilian Roos created ARROW-2431:
--

 Summary: [Rust] Schema fidelity
 Key: ARROW-2431
 URL: https://issues.apache.org/jira/browse/ARROW-2431
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Maximilian Roos


ref [https://github.com/apache/arrow/pull/1829#discussion_r179248743]

Currently our Traits are not loyal to 
[https://arrow.apache.org/docs/metadata.html].

For example, we nest `Field`s in the `DataType` (aka `type`) attribute of the 
parent Field (rather than having the type be `Struct` and a separate `Children` 
parameter)

 

Is this OK, assuming that we can read and write accurate schemas? Or should we 
move towards having the Schema trait be consistent with the metadata spec?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2430:
--

 Summary: MVP for branch based packaging automation
 Key: ARROW-2430
 URL: https://issues.apache.org/jira/browse/ARROW-2430
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs


Described in 
https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431103#comment-16431103
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] 
JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector 
Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180205035
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import org.apache.arrow.vector.*;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.nio.charset.Charset;
+import java.sql.*;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+assert rsmd != null;
+
+//ImmutableList.Builder fields = ImmutableList.builder();
+List fields = new ArrayList<>();
+int columnCount = rsmd.getColumnCount();
+for (int i = 1; i <= columnCount; i++) {
+String columnName = rsmd.getColumnName(i);
+switch (rsmd.getColumnType(i)) {
+case Types.BOOLEAN:
+case Types.BIT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Bool()), null));
+break;
+case Types.TINYINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(8, true)), null));
+break;
+case Types.SMALLINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(16, true)), null));
+break;
+case Types.INTEGER:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(32, true)), null));
+break;
+case 

[jira] [Updated] (ARROW-2399) Builder should not provide a set() method

2018-04-09 Thread Maximilian Roos (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Roos updated ARROW-2399:
---
Description: 
HArrays should be immutable, but we have a `set` method on Buffer that 
should not be there.

This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
own memory instead and not use Buffer?

  was:
Arrays should be immutable, but we have a `set` method on Buffer that should 
not be there.

This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
own memory instead and not use Buffer?


> Builder should not provide a set() method
> 
>
> Key: ARROW-2399
> URL: https://issues.apache.org/jira/browse/ARROW-2399
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> HArrays should be immutable, but we have a `set` method on Buffer that 
> should not be there.
> This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
> own memory instead and not use Buffer?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2399) Builder should not provide a set() method

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431021#comment-16431021
 ] 

Antoine Pitrou commented on ARROW-2399:
---

Could you also please prefix Rust issues with "[Rust]", so that the list of 
issues gives more information? Thanks :-)

> Builder should not provide a set() method
> 
>
> Key: ARROW-2399
> URL: https://issues.apache.org/jira/browse/ARROW-2399
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Arrays should be immutable, but we have a `set` method on Buffer that 
> should not be there.
> This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
> own memory instead and not use Buffer?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431015#comment-16431015
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] 
JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector 
Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180185358
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import org.apache.arrow.vector.*;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.nio.charset.Charset;
+import java.sql.*;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+assert rsmd != null;
+
+//ImmutableList.Builder fields = ImmutableList.builder();
+List fields = new ArrayList<>();
+int columnCount = rsmd.getColumnCount();
+for (int i = 1; i <= columnCount; i++) {
+String columnName = rsmd.getColumnName(i);
+switch (rsmd.getColumnType(i)) {
+case Types.BOOLEAN:
+case Types.BIT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Bool()), null));
+break;
+case Types.TINYINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(8, true)), null));
+break;
+case Types.SMALLINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(16, true)), null));
+break;
+case Types.INTEGER:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(32, true)), null));
+break;
+case 

[jira] [Assigned] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2328:
--

Assignee: Adrian

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Assignee: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2328:
-

Assignee: Antoine Pitrou

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2328:
-

Assignee: (was: Antoine Pitrou)

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2328.
---
Resolution: Fixed

Issue resolved by pull request 1784
[https://github.com/apache/arrow/pull/1784]

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2427:
--
Labels: pull-request-available  (was: )

> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430953#comment-16430953
 ] 

ASF GitHub Bot commented on ARROW-2427:
---

pitrou opened a new pull request #1867: [WIP] ARROW-2427: [C++] Implement 
ReadAt properly
URL: https://github.com/apache/arrow/pull/1867
 
 
   Allow for concurrent I/O by avoiding locking and seeking.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430843#comment-16430843
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180156054
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   No problem :) I'm still learning arrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430786#comment-16430786
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on issue #1784: ARROW-2328: [C++] Fixed and unit tested 
feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#issuecomment-379806635
 
 
   Thank you! I will merge once the AppVeyor build passes (the Travis-CI 
failures in the Rust and glib builds are unrelated).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430775#comment-16430775
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379803722
 
 
   Waiting for the AppVeyor build before merging this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

2018-04-09 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2429:

Description: 
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

Minimal example:
 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')

print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}

  was:
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')

print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}


> [Python] Timestamp unit in schema changes when writing to Parquet file then 
> reading back
> 
>
> Key: ARROW-2429
> URL: https://issues.apache.org/jira/browse/ARROW-2429
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python
>Reporter: Dave Challis
>Priority: Minor
>
> When creating an Arrow table from a Pandas DataFrame, the table schema 
> contains a field of type `timestamp[ns]`.
> When serialising that table to a parquet file and then immediately reading it 
> back, the schema of the table read instead contains a field with type 
> `timestamp[us]`.
> Minimal example:
>  
> {code:python}
> #!/usr/bin/env python
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> # create DataFrame with a datetime column
> df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
> df['created'] = pd.to_datetime(df['created'])
> # create Arrow table from DataFrame
> table = pa.Table.from_pandas(df, preserve_index=False)
> # write the table as a parquet file, then read it back again
> pq.write_table(table, 'foo.parquet')
> table2 = pq.read_table('foo.parquet')
> print(table.schema[0])  # pyarrow.Field (nanosecond 
> units)
> print(table2.schema[0]) # pyarrow.Field (microsecond 
> units)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

2018-04-09 Thread Dave Challis (JIRA)
Dave Challis created ARROW-2429:
---

 Summary: [Python] Timestamp unit in schema changes when writing to 
Parquet file then reading back
 Key: ARROW-2429
 URL: https://issues.apache.org/jira/browse/ARROW-2429
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Mac OS High Sierra
PyArrow 0.9.0 (py36_1)
Python
Reporter: Dave Challis


When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')



print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

2018-04-09 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2429:

Description: 
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')

print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}

  was:
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')



print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}


> [Python] Timestamp unit in schema changes when writing to Parquet file then 
> reading back
> 
>
> Key: ARROW-2429
> URL: https://issues.apache.org/jira/browse/ARROW-2429
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python
>Reporter: Dave Challis
>Priority: Minor
>
> When creating an Arrow table from a Pandas DataFrame, the table schema 
> contains a field of type `timestamp[ns]`.
> When serialising that table to a parquet file and then immediately reading it 
> back, the schema of the table read instead contains a field with type 
> `timestamp[us]`.
>  
> {code:python}
> #!/usr/bin/env python
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> # create DataFrame with a datetime column
> df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
> df['created'] = pd.to_datetime(df['created'])
> # create Arrow table from DataFrame
> table = pa.Table.from_pandas(df, preserve_index=False)
> # write the table as a parquet file, then read it back again
> pq.write_table(table, 'foo.parquet')
> table2 = pq.read_table('foo.parquet')
> print(table.schema[0])  # pyarrow.Field (nanosecond 
> units)
> print(table2.schema[0]) # pyarrow.Field (microsecond 
> units)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2100) [Python] Drop Python 3.4 support

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2100:
-

Assignee: Antoine Pitrou

> [Python] Drop Python 3.4 support
> 
>
> Key: ARROW-2100
> URL: https://issues.apache.org/jira/browse/ARROW-2100
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> conda-forge has already dropped it, Pandas dropped it in 0.21, we should also 
> think of dropping support for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2305) [Python] Cython 0.25.2 compilation failure

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430753#comment-16430753
 ] 

ASF GitHub Bot commented on ARROW-2305:
---

pitrou closed pull request #1863: ARROW-2305: [Python] Bump Cython requirement 
to 0.27+
URL: https://github.com/apache/arrow/pull/1863
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ci/msvc-build.bat b/ci/msvc-build.bat
index 678e29d58..d3f540b2d 100644
--- a/ci/msvc-build.bat
+++ b/ci/msvc-build.bat
@@ -68,10 +68,8 @@ if "%JOB%" == "Build_Debug" (
   exit /B 0
 )
 
-@rem Note: avoid Cython 0.28.0 due to 
https://github.com/cython/cython/issues/2148
 conda create -n arrow -q -y python=%PYTHON% ^
-  six pytest setuptools numpy pandas ^
-  cython=0.27.3 ^
+  six pytest setuptools numpy pandas cython ^
   thrift-cpp=0.11.0
 
 call activate arrow
diff --git a/ci/travis_script_python.sh b/ci/travis_script_python.sh
index aa3c3154c..a776c4263 100755
--- a/ci/travis_script_python.sh
+++ b/ci/travis_script_python.sh
@@ -36,13 +36,12 @@ source activate $CONDA_ENV_DIR
 python --version
 which python
 
-# Note: avoid Cython 0.28.0 due to https://github.com/cython/cython/issues/2148
 conda install -y -q pip \
   nomkl \
   cloudpickle \
   numpy=1.13.1 \
   pandas \
-  cython=0.27.3
+  cython
 
 # ARROW-2093: PyTorch increases the size of our conda dependency stack
 # significantly, and so we have disabled these tests in Travis CI for now
diff --git a/dev/release/verify-release-candidate.sh 
b/dev/release/verify-release-candidate.sh
index 34aff209a..ef058d172 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -104,7 +104,7 @@ setup_miniconda() {
 numpy \
 pandas \
 six \
-cython=0.27.3 -c conda-forge
+cython -c conda-forge
   source activate arrow-test
 }
 
diff --git a/python/manylinux1/scripts/build_virtualenvs.sh 
b/python/manylinux1/scripts/build_virtualenvs.sh
index 7e0d80cc7..a983721e9 100755
--- a/python/manylinux1/scripts/build_virtualenvs.sh
+++ b/python/manylinux1/scripts/build_virtualenvs.sh
@@ -34,7 +34,7 @@ for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do
 
 echo "=== (${PYTHON}, ${U_WIDTH}) Installing build dependencies ==="
 $PIP install "numpy==1.10.4"
-$PIP install "cython==0.27.3"
+$PIP install "cython==0.28.1"
 $PIP install "pandas==0.20.3"
 $PIP install "virtualenv==15.1.0"
 
diff --git a/python/setup.py b/python/setup.py
index 7b0f17544..dd042c956 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -42,8 +42,8 @@
 # Check if we're running 64-bit Python
 is_64_bit = sys.maxsize > 2**32
 
-if Cython.__version__ < '0.19.1':
-raise Exception('Please upgrade to Cython 0.19.1 or newer')
+if Cython.__version__ < '0.27':
+raise Exception('Please upgrade to Cython 0.27 or newer')
 
 setup_dir = os.path.abspath(os.path.dirname(__file__))
 
@@ -491,7 +491,7 @@ def parse_version(root):
 ]
 },
 use_scm_version={"root": "..", "relative_to": __file__, "parse": 
parse_version},
-setup_requires=['setuptools_scm', 'cython >= 0.23'] + setup_requires,
+setup_requires=['setuptools_scm', 'cython >= 0.27'] + setup_requires,
 install_requires=install_requires,
 tests_require=['pytest', 'pandas'],
 description="Python library for Apache Arrow",


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Cython 0.25.2 compilation failure 
> ---
>
> Key: ARROW-2305
> URL: https://issues.apache.org/jira/browse/ARROW-2305
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Observed on master branch
> {code}
> Error compiling Cython file:
> 
> ...
> if hasattr(self, 'as_py'):
> return repr(self.as_py())
> else:
> return super(Scalar, self).__repr__()
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/scalar.pxi:67:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 

[jira] [Resolved] (ARROW-2305) [Python] Cython 0.25.2 compilation failure

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2305.
---
Resolution: Fixed

Issue resolved by pull request 1863
[https://github.com/apache/arrow/pull/1863]

> [Python] Cython 0.25.2 compilation failure 
> ---
>
> Key: ARROW-2305
> URL: https://issues.apache.org/jira/browse/ARROW-2305
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Observed on master branch
> {code}
> Error compiling Cython file:
> 
> ...
> if hasattr(self, 'as_py'):
> return repr(self.as_py())
> else:
> return super(Scalar, self).__repr__()
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/scalar.pxi:67:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> Return true if the tensors contains exactly equal data
> """
> self._validate()
> return self.tp.Equals(deref(other.tp))
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/array.pxi:571:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> cdef c_bool result = False
> with nogil:
> result = self.buffer.get().Equals(deref(other.buffer.get()))
> return result
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/io.pxi:675:4: Special method __eq__ must 
> be implemented via __richcmp__
> {code}
> Upgrading Cython made this go away. We should probably use {{__richcmp__}} 
> though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2305) [Python] Cython 0.25.2 compilation failure

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430748#comment-16430748
 ] 

ASF GitHub Bot commented on ARROW-2305:
---

pitrou commented on issue #1863: ARROW-2305: [Python] Bump Cython requirement 
to 0.27+
URL: https://github.com/apache/arrow/pull/1863#issuecomment-379798200
 
 
   AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.270
   
   The Travis-CI failure is unrelated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Cython 0.25.2 compilation failure 
> ---
>
> Key: ARROW-2305
> URL: https://issues.apache.org/jira/browse/ARROW-2305
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Observed on master branch
> {code}
> Error compiling Cython file:
> 
> ...
> if hasattr(self, 'as_py'):
> return repr(self.as_py())
> else:
> return super(Scalar, self).__repr__()
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/scalar.pxi:67:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> Return true if the tensors contains exactly equal data
> """
> self._validate()
> return self.tp.Equals(deref(other.tp))
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/array.pxi:571:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> cdef c_bool result = False
> with nogil:
> result = self.buffer.get().Equals(deref(other.buffer.get()))
> return result
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/io.pxi:675:4: Special method __eq__ must 
> be implemented via __richcmp__
> {code}
> Upgrading Cython made this go away. We should probably use {{__richcmp__}} 
> though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2100) [Python] Drop Python 3.4 support

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2100.
---
Resolution: Fixed

Issue resolved by pull request 1862
[https://github.com/apache/arrow/pull/1862]

> [Python] Drop Python 3.4 support
> 
>
> Key: ARROW-2100
> URL: https://issues.apache.org/jira/browse/ARROW-2100
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> conda-forge has already dropped it, Pandas dropped it in 0.21, we should also 
> think of dropping support for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2100) [Python] Drop Python 3.4 support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430746#comment-16430746
 ] 

ASF GitHub Bot commented on ARROW-2100:
---

pitrou closed pull request #1862: ARROW-2100: [Python] Drop Python 3.4 support
URL: https://github.com/apache/arrow/pull/1862
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/manylinux1/build_arrow.sh b/python/manylinux1/build_arrow.sh
index 6697733d0..9742da09f 100755
--- a/python/manylinux1/build_arrow.sh
+++ b/python/manylinux1/build_arrow.sh
@@ -26,7 +26,7 @@
 # * Copyright (c) 2013-2016, Matt Terry and Matthew Brett (BSD 2-clause)
 
 # Build different python versions with various unicode widths
-PYTHON_VERSIONS="${PYTHON_VERSIONS:-2.7,16 2.7,32 3.4,16 3.5,16 3.6,16}"
+PYTHON_VERSIONS="${PYTHON_VERSIONS:-2.7,16 2.7,32 3.5,16 3.6,16}"
 
 source /multibuild/manylinux_utils.sh
 
diff --git a/python/setup.py b/python/setup.py
index 7b0f17544..d9a68846b 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -500,7 +500,6 @@ def parse_version(root):
 classifiers=[
 'License :: OSI Approved :: Apache Software License',
 'Programming Language :: Python :: 2.7',
-'Programming Language :: Python :: 3.4',
 'Programming Language :: Python :: 3.5',
 'Programming Language :: Python :: 3.6'
 ],


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Drop Python 3.4 support
> 
>
> Key: ARROW-2100
> URL: https://issues.apache.org/jira/browse/ARROW-2100
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> conda-forge has already dropped it, Pandas dropped it in 0.21, we should also 
> think of dropping support for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2428) [Python] Support ExtensionArrays in to_pandas conversion

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2428:
---
Labels: beginner  (was: )

> [Python] Support ExtensionArrays in to_pandas conversion
> 
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2428) [Python] Support ExtensionArrays in to_pandas conversion

2018-04-09 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2428:
--

 Summary: [Python] Support ExtensionArrays in to_pandas conversion
 Key: ARROW-2428
 URL: https://issues.apache.org/jira/browse/ARROW-2428
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 1.0.0


With the next release of Pandas, it will be possible to define custom column 
types that back a {{pandas.Series}}. Thus we will not be able to cover all 
possible column types in the {{to_pandas}} conversion by default as we won't be 
aware of all extension arrays.

To enable users to create {{ExtensionArray}} instances from Arrow columns in 
the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
call where they can overload the default conversion routines with the ones that 
produce their {{ExtensionArray}} instances.

This should avoid additional copies in the case where we would nowadays first 
convert the Arrow column into a default Pandas column (probably of object type) 
and the user would afterwards convert it to a more efficient 
{{ExtensionArray}}. This hook here will be especially useful when you build 
{{ExtensionArrays}} where the storage is backed by Arrow.

The meta-issue that tracks the implementation inside of Pandas is: 
https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430737#comment-16430737
 ] 

ASF GitHub Bot commented on ARROW-2424:
---

pitrou closed pull request #1864: ARROW-2424: [Rust] Fix build - add missing 
import
URL: https://github.com/apache/arrow/pull/1864
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/rust/src/builder.rs b/rust/src/builder.rs
index 832b2a4a8..9915a8b52 100644
--- a/rust/src/builder.rs
+++ b/rust/src/builder.rs
@@ -18,6 +18,7 @@
 use libc;
 use std::mem;
 use std::ptr;
+use std::slice;
 
 use super::buffer::*;
 use super::memory::*;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2424.
---
   Resolution: Fixed
Fix Version/s: (was: 0.10.0)
   JS-0.4.0

Issue resolved by pull request 1864
[https://github.com/apache/arrow/pull/1864]

> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430734#comment-16430734
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180136727
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Wow. Sorry, I had completely overlooked the `out_data[i] -= remainder;` line 
:-S


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1964) [Python] Expose Builder classes

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
---
Description: 
Having the builder classes available from Python would be very helpful. 
Currently a construction of an Arrow array always need to have a Python list or 
numpy array as intermediate. As  the builder in combination with jemalloc are 
very efficient in building up non-chunked memory, it would be nice to directly 
use them in certain cases.

The most useful builders are the 
[StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
 and 
[DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
 as they provide functionality to create columns that are not easily 
constructed using NumPy methods in Python.

The basic approach would be to wrap the C++ classes in 
https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
 so that they can be used from Cython. Afterwards, we should start a new file 
{{python/pyarrow/builder.pxi}} where we have classes take typical Python 
objects like {{str}} and pass them on to the C++ classes. At the end, these 
classes should also return (Python accessible) {{pyarrow.Array}} instances.

  was:Having the builder classes available from Python would be very helpful. 
Currently a construction of an Arrow array always need to have a Python list or 
numpy array as intermediate. As  the builder in combination with jemalloc are 
very efficient in building up non-chunked memory, it would be nice to directly 
use them in certain cases.


> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.
> The most useful builders are the 
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
>  and 
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
>  as they provide functionality to create columns that are not easily 
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in 
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
>  so that they can be used from Cython. Afterwards, we should start a new file 
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python 
> objects like {{str}} and pass them on to the C++ classes. At the end, these 
> classes should also return (Python accessible) {{pyarrow.Array}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2427:
-

 Summary: [C++] ReadAt implementations suboptimal
 Key: ARROW-2427
 URL: https://issues.apache.org/jira/browse/ARROW-2427
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


The {{ReadAt}} implementations for at least {{OSFile}} and {{MemoryMappedFile}} 
take the file lock and seek. They could instead read directly from the given 
offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2326) cannot import pip installed pyarrow on OS X (10.9)

2018-04-09 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430679#comment-16430679
 ] 

Uwe L. Korn commented on ARROW-2326:


Yes it is.

> cannot import pip installed pyarrow on OS X (10.9)
> --
>
> Key: ARROW-2326
> URL: https://issues.apache.org/jira/browse/ARROW-2326
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS X (10.9), Python 3.6
>Reporter: Paul Ivanov
>Priority: Major
> Fix For: 0.10.0
>
>
> {code:java}
> $ pip3 install pyarrow --user
> Collecting pyarrow
> Using cached pyarrow-0.8.0-cp36-cp36m-macosx_10_6_intel.whl
> Requirement already satisfied: six>=1.0.0 in 
> ./Library/Python/3.6/lib/python/site-packages (from pyarrow)
> Collecting numpy>=1.10 (from pyarrow)
> Using cached 
> numpy-1.14.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
> Installing collected packages: numpy, pyarrow
> Successfully installed numpy-1.14.2 pyarrow-0.8.0
> $ python3
> Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) 
> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/__init__.py", 
> line 32, in 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: @rpath/libarrow.0.dylib
> Referenced from: 
> /Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so
> Reason: image not found
> {code}
> I dug into it a bit and found that in older versions of install.rst, Wes 
> mentioned that XCode 6 had trouble with rpath, so not sure if that's what's 
> going on here for me. I'm on 10.9, I know it's really old, so if these wheels 
> can't be made to run on my ancient OS, I just wanted to report this so the 
> wheels uploaded to PyPI can reflect this incompatibility, if that is indeed 
> the case. I might also try some otool / install_name_tool tomfoolery to see 
> if I can get a workaround for myself.
> Thank you!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2326) cannot import pip installed pyarrow on OS X (10.9)

2018-04-09 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430674#comment-16430674
 ] 

Phillip Cloud commented on ARROW-2326:
--

[~xhochy] Is this fixed?

> cannot import pip installed pyarrow on OS X (10.9)
> --
>
> Key: ARROW-2326
> URL: https://issues.apache.org/jira/browse/ARROW-2326
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS X (10.9), Python 3.6
>Reporter: Paul Ivanov
>Priority: Major
> Fix For: 0.10.0
>
>
> {code:java}
> $ pip3 install pyarrow --user
> Collecting pyarrow
> Using cached pyarrow-0.8.0-cp36-cp36m-macosx_10_6_intel.whl
> Requirement already satisfied: six>=1.0.0 in 
> ./Library/Python/3.6/lib/python/site-packages (from pyarrow)
> Collecting numpy>=1.10 (from pyarrow)
> Using cached 
> numpy-1.14.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
> Installing collected packages: numpy, pyarrow
> Successfully installed numpy-1.14.2 pyarrow-0.8.0
> $ python3
> Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) 
> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/__init__.py", 
> line 32, in 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: @rpath/libarrow.0.dylib
> Referenced from: 
> /Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so
> Reason: image not found
> {code}
> I dug into it a bit and found that in older versions of install.rst, Wes 
> mentioned that XCode 6 had trouble with rpath, so not sure if that's what's 
> going on here for me. I'm on 10.9, I know it's really old, so if these wheels 
> can't be made to run on my ancient OS, I just wanted to report this so the 
> wheels uploaded to PyPI can reflect this incompatibility, if that is indeed 
> the case. I might also try some otool / install_name_tool tomfoolery to see 
> if I can get a workaround for myself.
> Thank you!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-564:
--
Labels: beginner  (was: )

> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1964) [Python] Expose Builder classes

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
---
Labels: beginner  (was: )

> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1964) [Python] Expose Builder classes

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
---
Summary: [Python] Expose Builder classes  (was: Python: Expose Builder 
classes)

> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2369) Large (>~20 GB) files written to Parquet via PyArrow are corrupted

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430650#comment-16430650
 ] 

ASF GitHub Bot commented on ARROW-2369:
---

pitrou opened a new pull request #1866: ARROW-2369: [Python] Fix reading large 
Parquet files (> 4 GB)
URL: https://github.com/apache/arrow/pull/1866
 
 
   - Fix PythonFile.seek() for offsets > 4 GB
   - Avoid instantiating a PythonFile in ParquetFile, for efficiency


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Large (>~20 GB) files written to Parquet via PyArrow are corrupted
> --
>
> Key: ARROW-2369
> URL: https://issues.apache.org/jira/browse/ARROW-2369
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Reproduced on Ubuntu + Mac OSX
>Reporter: Justin Tan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: Parquet, bug, pandas, parquetWriter, 
> pull-request-available, pyarrow
> Fix For: 0.10.0
>
> Attachments: Screen Shot 2018-03-30 at 11.54.01 pm.png
>
>
> When writing large Parquet files (above 10 GB or so) from Pandas to Parquet 
> via the command
> {{pq.write_table(my_df, 'table.parquet')}}
> The write succeeds, but when the parquet file is loaded, the error message
> {{ArrowIOError: Invalid parquet file. Corrupt footer.}}
> appears. This same error occurs when the parquet file is written chunkwise as 
> well. When the parquet files are small, say < 5 GB or so (drawn randomly from 
> the same dataset), everything proceeds as normal. I've also tried this with 
> Pandas df.to_parquet(), with the same results.
> Update: Looks like any DataFrame with size above ~5GB (on disk) returns the 
> same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430645#comment-16430645
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180122730
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Sure, but don't we need another branch then to handle when time truncation 
is allowed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430631#comment-16430631
 ] 

ASF GitHub Bot commented on ARROW-2424:
---

andygrove commented on issue #1864: ARROW-2424: [Rust] Fix build - add missing 
import
URL: https://github.com/apache/arrow/pull/1864#issuecomment-379777861
 
 
   @pitrou I updated it as requested


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2424:
--
Labels: pull-request-available  (was: )

> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430630#comment-16430630
 ] 

Antoine Pitrou commented on ARROW-2426:
---

[~kou]

> [CI] glib build failure
> ---
>
> Key: ARROW-2426
> URL: https://issues.apache.org/jira/browse/ARROW-2426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Major
>
> The glib build on Travis-CI fails:
> [https://travis-ci.org/apache/arrow/jobs/364123364#L6840]
> {code}
> ==> Installing gobject-introspection
> ==> Downloading 
> https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
> ==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
>   /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
> Installing gobject-introspection has failed!
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2426:
-

 Summary: [CI] glib build failure
 Key: ARROW-2426
 URL: https://issues.apache.org/jira/browse/ARROW-2426
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Antoine Pitrou


The glib build on Travis-CI fails:

[https://travis-ci.org/apache/arrow/jobs/364123364#L6840]

{code}
==> Installing gobject-introspection
==> Downloading 
https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
  /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
Installing gobject-introspection has failed!
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430628#comment-16430628
 ] 

ASF GitHub Bot commented on ARROW-2422:
---

xhochy commented on issue #1861: ARROW-2422 Support more operators for 
partition filtering
URL: https://github.com/apache/arrow/pull/1861#issuecomment-379777120
 
 
   Can you add unit tests for more than just integer as a type?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430623#comment-16430623
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180118162
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   What I'm suggesting is:
   ```cpp
   if (!options.allow_time_truncate) {
 // Ensure that intraday milliseconds have been zeroed out
 auto out_data = GetMutableValues(output, 1);
   
 if (input.null_count != 0) {
   internal::BitmapReader bit_reader(input.buffers[0]->data(), 
input.offset,
 input.length);
   
   for (int64_t i = 0; i < input.length; ++i) {
 const int64_t remainder = out_data[i] % kMillisecondsInDay;
 if (ARROW_PREDICT_FALSE(remainder > 0 && bit_reader.IsSet())) {
   ctx->SetStatus(
   Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
   break;
 }
 out_data[i] -= remainder;
 bit_reader.Next();
   }
 } else {
   for (int64_t i = 0; i < input.length; ++i) {
 const int64_t remainder = out_data[i] % kMillisecondsInDay;
 if (ARROW_PREDICT_FALSE(remainder > 0)) {
   ctx->SetStatus(
   Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
   break;
 }
 out_data[i] -= remainder;
   }
 }
   }
   ```
   
   Does it make sense?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> 

[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430574#comment-16430574
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180106203
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   I might misunderstand, but:
   
   ```python
   # with allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T10:24:01',
   ]  # OK
   
   # without allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T10:24:01',  # <- fails here
   ]  
   
   # with allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T00:00:00',
   ]  # OK
   
   # without allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T00:00:00',
   ]  # OK - this would fail if I test outside the loop
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2369) Large (>~20 GB) files written to Parquet via PyArrow are corrupted

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430572#comment-16430572
 ] 

Antoine Pitrou commented on ARROW-2369:
---

Ok, there are two things going on:
* when {{write_table()}} is called with a filepath string, it goes through 
{{PythonFile}}, which is probably inefficient
* {{PythonFile.Seek}} doesn't handle seek offsets greater than 2**32 properly:
{code:python}
>>> f = open('/tmp/empty', 'wb')
>>> f.truncate(1<<33 + 10)
8796093022208
>>> f.close()
>>> f = open('/tmp/empty', 'rb')
>>> paf = pa.PythonFile(f, 'rb')
>>> paf.tell()
0
>>> paf.seek(5)
5
>>> paf.tell()
5
>>> paf.seek(1<<33 + 6)
0
>>> paf.tell()
0
>>> f.seek(1<<33 + 6)
549755813888
>>> f.tell()
549755813888
{code}

> Large (>~20 GB) files written to Parquet via PyArrow are corrupted
> --
>
> Key: ARROW-2369
> URL: https://issues.apache.org/jira/browse/ARROW-2369
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Reproduced on Ubuntu + Mac OSX
>Reporter: Justin Tan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: Parquet, bug, pandas, parquetWriter, pyarrow
> Fix For: 0.10.0
>
> Attachments: Screen Shot 2018-03-30 at 11.54.01 pm.png
>
>
> When writing large Parquet files (above 10 GB or so) from Pandas to Parquet 
> via the command
> {{pq.write_table(my_df, 'table.parquet')}}
> The write succeeds, but when the parquet file is loaded, the error message
> {{ArrowIOError: Invalid parquet file. Corrupt footer.}}
> appears. This same error occurs when the parquet file is written chunkwise as 
> well. When the parquet files are small, say < 5 GB or so (drawn randomly from 
> the same dataset), everything proceeds as normal. I've also tried this with 
> Pandas df.to_parquet(), with the same results.
> Update: Looks like any DataFrame with size above ~5GB (on disk) returns the 
> same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2369) Large (>~20 GB) files written to Parquet via PyArrow are corrupted

2018-04-09 Thread Justin Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430571#comment-16430571
 ] 

Justin Tan commented on ARROW-2369:
---

Looks like the file is readable by early pyarrow versions (0.5.0 - but created 
by v0.5.0 as well), so maybe something went wrong from 0.5.0 -> 0.9.0

> Large (>~20 GB) files written to Parquet via PyArrow are corrupted
> --
>
> Key: ARROW-2369
> URL: https://issues.apache.org/jira/browse/ARROW-2369
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Reproduced on Ubuntu + Mac OSX
>Reporter: Justin Tan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: Parquet, bug, pandas, parquetWriter, pyarrow
> Fix For: 0.10.0
>
> Attachments: Screen Shot 2018-03-30 at 11.54.01 pm.png
>
>
> When writing large Parquet files (above 10 GB or so) from Pandas to Parquet 
> via the command
> {{pq.write_table(my_df, 'table.parquet')}}
> The write succeeds, but when the parquet file is loaded, the error message
> {{ArrowIOError: Invalid parquet file. Corrupt footer.}}
> appears. This same error occurs when the parquet file is written chunkwise as 
> well. When the parquet files are small, say < 5 GB or so (drawn randomly from 
> the same dataset), everything proceeds as normal. I've also tried this with 
> Pandas df.to_parquet(), with the same results.
> Update: Looks like any DataFrame with size above ~5GB (on disk) returns the 
> same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2425) [Rust] Array::from missing mapping for u8 type

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2425:
--
Labels: pull-request-available  (was: )

> [Rust] Array::from missing mapping for u8 type
> --
>
> Key: ARROW-2425
> URL: https://issues.apache.org/jira/browse/ARROW-2425
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Macros are used to support Array::from for each primitive type but u8 was 
> missing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2425) [Rust] Array::from missing mapping for u8 type

2018-04-09 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2425:
-

 Summary: [Rust] Array::from missing mapping for u8 type
 Key: ARROW-2425
 URL: https://issues.apache.org/jira/browse/ARROW-2425
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


Macros are used to support Array::from for each primitive type but u8 was 
missing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2369) Large (>~20 GB) files written to Parquet via PyArrow are corrupted

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2369:
-

Assignee: Antoine Pitrou

> Large (>~20 GB) files written to Parquet via PyArrow are corrupted
> --
>
> Key: ARROW-2369
> URL: https://issues.apache.org/jira/browse/ARROW-2369
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Reproduced on Ubuntu + Mac OSX
>Reporter: Justin Tan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: Parquet, bug, pandas, parquetWriter, pyarrow
> Fix For: 0.10.0
>
> Attachments: Screen Shot 2018-03-30 at 11.54.01 pm.png
>
>
> When writing large Parquet files (above 10 GB or so) from Pandas to Parquet 
> via the command
> {{pq.write_table(my_df, 'table.parquet')}}
> The write succeeds, but when the parquet file is loaded, the error message
> {{ArrowIOError: Invalid parquet file. Corrupt footer.}}
> appears. This same error occurs when the parquet file is written chunkwise as 
> well. When the parquet files are small, say < 5 GB or so (drawn randomly from 
> the same dataset), everything proceeds as normal. I've also tried this with 
> Pandas df.to_parquet(), with the same results.
> Update: Looks like any DataFrame with size above ~5GB (on disk) returns the 
> same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430566#comment-16430566
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180103071
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   What I mean is that you can skip the whole thing is 
`options.allow_time_truncate` is true (the compiler might do the optimization 
for us, but still).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430564#comment-16430564
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180102499
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Doesn't the first value encountered with time part trigger the error - which 
has to be checked inside the loop?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430548#comment-16430548
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180100389
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   `options.allow_time_truncate` is a constant accross this whole piece of 
code, so just add a higher-level `if` statement around all this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-2424:
--
Component/s: Rust
Summary: [Rust] Missing import causing broken build  (was: Missing 
import causing broken build)

> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2424) Missing import causing broken build

2018-04-09 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2424:
-

 Summary: Missing import causing broken build
 Key: ARROW-2424
 URL: https://issues.apache.org/jira/browse/ARROW-2424
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2424) Missing import causing broken build

2018-04-09 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430543#comment-16430543
 ] 

Andy Grove commented on ARROW-2424:
---

PR: https://github.com/apache/arrow/pull/1864

> Missing import causing broken build
> ---
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread Dave Challis (JIRA)
Dave Challis created ARROW-2423:
---

 Summary: [Python] PyArrow datatypes raise ValueError on equality 
checks against non-PyArrow objects
 Key: ARROW-2423
 URL: https://issues.apache.org/jira/browse/ARROW-2423
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Mac OS High Sierra
PyArrow 0.9.0 (py36_1)
Python 3.6.3
Reporter: Dave Challis


Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
causes a `ValueError` to be raised, rather than either returning a True/False 
value, or returning 
[NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
 if the comparison isn't implemented.

E.g. attempting to call:
{code:java}
import pyarrow
pyarrow.int32() == 'foo'
{code}
results in:
{code:java}
Traceback (most recent call last):
  File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
KeyError: 'foo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "t.py", line 2, in 
pyarrow.int32() == 'foo'
  File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
  File "types.pxi", line 113, in pyarrow.lib.DataType.equals
  File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
ValueError: No type alias for foo
{code}
The expected outcome for the above would be for the comparison to return 
`False`, as that's the general behaviour for comparisons between objects of 
different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430523#comment-16430523
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r180093019
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -223,15 +223,17 @@ Status MakeRandomBinaryArray(int64_t length, bool 
include_nulls, MemoryPool* poo
 if (include_nulls && values_index == 0) {
   RETURN_NOT_OK(builder.AppendNull());
 } else {
-  const std::string& value = values[values_index];
+  const std::string value =
+  i < int64_t(values.size()) ? values[values_index] : 
std::to_string(i);
 
 Review comment:
   Not knowing the history of this particular function, I don't know what would 
be "better" in there. For my test I pretty much just want the consecutive 
numbers, otherwise it is very difficult to see what has gone wrong (I like the 
tests to give me the answer to that question if possible). I made a change to 
add a second function that implements that, but turns out this function is only 
used from MakeStringTypesRecordBatch, so we can't really have two 
implementations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430493#comment-16430493
 ] 

ASF GitHub Bot commented on ARROW-2422:
---

jneuff commented on issue #1861: ARROW-2422 Support more operators for 
partition filtering
URL: https://github.com/apache/arrow/pull/1861#issuecomment-379742811
 
 
   @pacman82


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2420) [Rust] Memory is never released

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430490#comment-16430490
 ] 

ASF GitHub Bot commented on ARROW-2420:
---

pitrou closed pull request #1860: ARROW-2420: [Rust] Fix major memory bug and 
add benches
URL: https://github.com/apache/arrow/pull/1860
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/rust/Cargo.toml b/rust/Cargo.toml
index c3120cfdc..4d2476b0c 100644
--- a/rust/Cargo.toml
+++ b/rust/Cargo.toml
@@ -36,4 +36,15 @@ path = "src/lib.rs"
 [dependencies]
 bytes = "0.4"
 libc = "0.2"
-serde_json = "1.0.13"
\ No newline at end of file
+serde_json = "1.0.13"
+
+[dev-dependencies]
+criterion = "0.2"
+
+[[bench]]
+name = "array_from_vec"
+harness = false
+
+[[bench]]
+name = "array_from_builder"
+harness = false
\ No newline at end of file
diff --git a/rust/benches/array_from_builder.rs 
b/rust/benches/array_from_builder.rs
new file mode 100644
index 0..3d020030e
--- /dev/null
+++ b/rust/benches/array_from_builder.rs
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#[macro_use]
+extern crate criterion;
+
+use criterion::Criterion;
+
+extern crate arrow;
+
+use arrow::array::*;
+use arrow::builder::*;
+
+fn array_from_builder(n: usize) {
+let mut v: Builder = Builder::with_capacity(n);
+for i in 0..n {
+v.push(i as i32);
+}
+Array::from(v.finish());
+}
+
+fn criterion_benchmark(c:  Criterion) {
+c.bench_function("array_from_builder 128", |b| {
+b.iter(|| array_from_builder(128))
+});
+c.bench_function("array_from_builder 256", |b| {
+b.iter(|| array_from_builder(256))
+});
+c.bench_function("array_from_builder 512", |b| {
+b.iter(|| array_from_builder(512))
+});
+}
+
+criterion_group!(benches, criterion_benchmark);
+criterion_main!(benches);
diff --git a/rust/benches/array_from_vec.rs b/rust/benches/array_from_vec.rs
new file mode 100644
index 0..0feb0de0b
--- /dev/null
+++ b/rust/benches/array_from_vec.rs
@@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#[macro_use]
+extern crate criterion;
+
+use criterion::Criterion;
+
+extern crate arrow;
+
+use arrow::array::*;
+
+fn array_from_vec(n: usize) {
+let mut v: Vec = Vec::with_capacity(n);
+for i in 0..n {
+v.push(i as i32);
+}
+Array::from(v);
+}
+
+fn criterion_benchmark(c:  Criterion) {
+c.bench_function("array_from_vec 128", |b| b.iter(|| array_from_vec(128)));
+c.bench_function("array_from_vec 256", |b| b.iter(|| array_from_vec(256)));
+c.bench_function("array_from_vec 512", |b| b.iter(|| array_from_vec(512)));
+}
+
+criterion_group!(benches, criterion_benchmark);
+criterion_main!(benches);
diff --git a/rust/src/buffer.rs b/rust/src/buffer.rs
index 1f2ec6c8d..1cf004fb1 100644
--- a/rust/src/buffer.rs
+++ b/rust/src/buffer.rs
@@ -74,7 +74,10 @@ impl Buffer {
 
 impl Drop for Buffer {
 fn drop( self) {
-mem::drop(self.data)
+unsafe {
+let p = mem::transmute::<*const T, *mut libc::c_void>(self.data);
+libc::free(p);
+}
 }
 }
 
diff --git a/rust/src/builder.rs 

[jira] [Commented] (ARROW-2420) [Rust] Memory is never released

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430492#comment-16430492
 ] 

ASF GitHub Bot commented on ARROW-2420:
---

pitrou commented on issue #1860: ARROW-2420: [Rust] Fix major memory bug and 
add benches
URL: https://github.com/apache/arrow/pull/1860#issuecomment-379742452
 
 
   Thanks @andygrove !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Memory is never released
> ---
>
> Key: ARROW-2420
> URL: https://issues.apache.org/jira/browse/ARROW-2420
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Another embarrassing bug ... the code was calling the wrong method to release 
> memory and wasn't releasing memory.
> I have added some benchmarks for testing performance of creating arrays (and 
> dropping them) and these are working well now after fixing the memory bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2420) [Rust] Memory is never released

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2420.
---
Resolution: Fixed

Issue resolved by pull request 1860
[https://github.com/apache/arrow/pull/1860]

> [Rust] Memory is never released
> ---
>
> Key: ARROW-2420
> URL: https://issues.apache.org/jira/browse/ARROW-2420
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Another embarrassing bug ... the code was calling the wrong method to release 
> memory and wasn't releasing memory.
> I have added some benchmarks for testing performance of creating arrays (and 
> dropping them) and these are working well now after fixing the memory bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430487#comment-16430487
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

pitrou commented on issue #1846: ARROW-2408: [Rust] Ability to get ` [T]` 
from `Buffer`
URL: https://github.com/apache/arrow/pull/1846#issuecomment-379741637
 
 
   Great!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430486#comment-16430486
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

pitrou closed pull request #1846: ARROW-2408: [Rust] Ability to get ` [T]` 
from `Buffer`
URL: https://github.com/apache/arrow/pull/1846
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/rust/examples/array_from_builder.rs 
b/rust/examples/array_from_builder.rs
index 3a273a64d..ea1ecec45 100644
--- a/rust/examples/array_from_builder.rs
+++ b/rust/examples/array_from_builder.rs
@@ -18,7 +18,6 @@
 extern crate arrow;
 
 use arrow::array::*;
-use arrow::buffer::*;
 use arrow::builder::*;
 
 fn main() {
diff --git a/rust/src/buffer.rs b/rust/src/buffer.rs
index ab90a5b08..1f2ec6c8d 100644
--- a/rust/src/buffer.rs
+++ b/rust/src/buffer.rs
@@ -18,7 +18,6 @@
 use bytes::Bytes;
 use libc;
 use std::mem;
-use std::ptr;
 use std::slice;
 
 use super::memory::*;
diff --git a/rust/src/builder.rs b/rust/src/builder.rs
index 1cc024042..ebdf3a942 100644
--- a/rust/src/builder.rs
+++ b/rust/src/builder.rs
@@ -15,7 +15,6 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use bytes::Bytes;
 use libc;
 use std::mem;
 use std::ptr;
@@ -48,6 +47,21 @@ impl Builder {
 }
 }
 
+/// Get the internal byte-aligned memory buffer as a mutable slice
+pub fn slice_mut(, start: usize, end: usize) ->  [T] {
+assert!(start <= end);
+assert!(start < self.len as usize);
+assert!(end <= self.len as usize);
+unsafe {
+slice::from_raw_parts_mut(self.data.offset(start as isize), (end - 
start) as usize)
+}
+}
+
+/// Override the length
+pub fn set_len( self, len: usize) {
+self.len = len;
+}
+
 /// Push a value into the builder, growing the internal buffer as needed
 pub fn push( self, v: T) {
 assert!(!self.data.is_null());


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2415) [Rust] Fix using references in pattern matching

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430484#comment-16430484
 ] 

ASF GitHub Bot commented on ARROW-2415:
---

pitrou commented on issue #1851: ARROW-2415: [Rust] Fix clippy ref-match-pats 
warnings.
URL: https://github.com/apache/arrow/pull/1851#issuecomment-379741162
 
 
   Ok, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Fix using references in pattern matching
> ---
>
> Key: ARROW-2415
> URL: https://issues.apache.org/jira/browse/ARROW-2415
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Assignee: Bruce Mitchener
>Priority: Major
>  Labels: pull-request-available
>
> Clippy reports 
> [https://rust-lang-nursery.github.io/rust-clippy/v0.0.191/index.html#match_ref_pats]
>  warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >