from:"Bryan Cutler \(Jira\)"

[jira] [Created] (ARROW-8386) [Python] pyarrow.jvm raises error for empty Arrays

2020-04-09 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-8386:
---

 Summary: [Python] pyarrow.jvm raises error for empty Arrays
 Key: ARROW-8386
 URL: https://issues.apache.org/jira/browse/ARROW-8386
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
Reporter: Bryan Cutler
Assignee: Bryan Cutler


In the pyarrow.jvm module, when there is an empty array in Java, trying to 
create it in python raises a ValueError. This is because for an empty array, 
Java returns an empty list of buffers, then pyarrow.jvm attempts to create the 
array with pa.Array.from_buffers with an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7966) [Integration][Flight][C++] Client should verify each batch independently

2020-02-28 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7966:
---

 Summary: [Integration][Flight][C++] Client should verify each 
batch independently
 Key: ARROW-7966
 URL: https://issues.apache.org/jira/browse/ARROW-7966
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Bryan Cutler


Currently the C++ Flight test client in {{test_integration_client.cc}} reads 
all batches from JSON into a Table, reads all batches in the flight stream from 
the server into a Table, then compares the Tables for equality.  This is 
potentially a problem because a record batch might have specific information 
that is then lost in the conversion to a Table. For example, if the server 
sends empty batches, the resulting Table would not be different from one with 
no empty batches.

Instead, the client should check each record batch from the JSON file against 
each record batch from the server independently. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7933) [Java][Flight][Tests] Add roundtrip tests for Java Flight Test Client

2020-02-24 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7933:
---

 Summary: [Java][Flight][Tests] Add roundtrip tests for Java Flight 
Test Client
 Key: ARROW-7933
 URL: https://issues.apache.org/jira/browse/ARROW-7933
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: Bryan Cutler


There should be some built-in roundtrip tests for Java Flight 
IntegrationTestClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7770) [Release] Archery does not use correct integration test args

2020-02-04 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7770:
---

 Summary: [Release] Archery does not use correct integration test 
args
 Key: ARROW-7770
 URL: https://issues.apache.org/jira/browse/ARROW-7770
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery
Reporter: Bryan Cutler
Assignee: Bryan Cutler


When using release verification script and selecting integration tests, Archery 
ignores selected tests and runs all tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7723) [Python] StructArray timestamp type with timezone to_pandas convert error

2020-01-29 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7723:
---

 Summary: [Python] StructArray  timestamp type with timezone 
to_pandas convert error
 Key: ARROW-7723
 URL: https://issues.apache.org/jira/browse/ARROW-7723
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When a {{StructArray}} has a child that is a timestamp with a timezone, the 
{{to_pandas}} conversion outputs an int64 instead of a timestamp
{code:java}
In [1]: import pyarrow as pa 
   ...: import pandas as pd 
   ...: arr = pa.array([{'start': pd.Timestamp.now(), 'end': 
pd.Timestamp.now()}]) 
   ...: 
 

In [2]: arr.to_pandas() 
  
Out[2]: 
0{'end': 2020-01-29 11:38:02.792681, 'start': 2...
dtype: object

In [3]: ts = pd.Timestamp.now() 
 

In [4]: arr2 = pa.array([ts], type=pa.timestamp('us', tz='America/New_York'))   
 

In [5]: arr2.to_pandas()
  
Out[5]: 
0   2020-01-29 06:38:47.848944-05:00
dtype: datetime64[ns, America/New_York]

In [6]: arr = pa.StructArray.from_arrays([arr2, arr2], ['start', 'stop'])   
 

In [7]: arr.to_pandas() 
  
Out[7]: 
0{'start': 1580297927848944000, 'stop': 1580297...
dtype: object

{code}
from https://github.com/apache/arrow/pull/6312



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7709) [Python] Conversion from Table Column to Pandas loses name for Timestamps

2020-01-28 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7709:
---

 Summary: [Python] Conversion from Table Column to Pandas loses 
name for Timestamps
 Key: ARROW-7709
 URL: https://issues.apache.org/jira/browse/ARROW-7709
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When converting a Table timestamp column to Pandas, the name of the column is 
lost in the resulting series.
{code:java}
In [23]: a1 = pa.array([pd.Timestamp.now()])
 

In [24]: a2 = pa.array([1]) 
 

In [25]: t = pa.Table.from_arrays([a1, a2], ['ts', 'a'])
 

In [26]: for c in t: 
...: print(c.to_pandas()) 
...:
 
0   2020-01-28 13:17:26.738708
dtype: datetime64[ns]
01
Name: a, dtype: int64 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7693) [CI] Fix test-conda-python-3.7-spark-master nightly errors

2020-01-27 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7693:
---

 Summary: [CI] Fix test-conda-python-3.7-spark-master nightly errors
 Key: ARROW-7693
 URL: https://issues.apache.org/jira/browse/ARROW-7693
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Spark master renamed some tests, need to update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7502) [Integration] Remove Spark Integration patch that not needed anymore

2020-01-06 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7502:
---

 Summary: [Integration] Remove Spark Integration patch that not 
needed anymore
 Key: ARROW-7502
 URL: https://issues.apache.org/jira/browse/ARROW-7502
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Apache Spark master has been updated to work with Arrow 0.15.1 after the binary 
protocol  change and patching Spark master is no longer necessary to build with 
current Arrow, so the previous patch can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7223) [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true

2019-11-20 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7223:
---

 Summary: [Java] Provide default setting of 
io.netty.tryReflectionSetAccessible=true
 Key: ARROW-7223
 URL: https://issues.apache.org/jira/browse/ARROW-7223
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler


After ARROW-3191, consumers of Arrow Java with a JDK 9 and above are required 
to set the JVM property "io.netty.tryReflectionSetAccessible=true" at startup, 
each time Arrow code is run, as documented at 
https://github.com/apache/arrow/tree/master/java#java-properties. Not doing 
this will result in the error "java.lang.UnsupportedOperationException: 
sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available", making 
Arrow unusable out-of-the-box.

This proposes to automatically set the property if not already set in the 
following steps:

1) check to see if the property io.netty.tryReflectionSetAccessible has been set
2) if not set, automatically set to "true"
3) else if set to "false", catch the Netty error and prepend the error message 
with the suggested setting of "true"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7173) Add test to verify Map field names can be arbitrary

2019-11-14 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7173:
---

 Summary: Add test to verify Map field names can be arbitrary
 Key: ARROW-7173
 URL: https://issues.apache.org/jira/browse/ARROW-7173
 Project: Apache Arrow
  Issue Type: Test
  Components: Integration
Reporter: Bryan Cutler


A Map has child fields and the format spec only recommends that they be named 
"entries", "key", and "value" but could be named anything. Currently, 
integration tests for Map arrays verify the exchanged schema is equal, so the 
child fields are always named the same. There should be tests that use 
different names to verify implementations can accept this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6904) [Python] Implement MapArray and MapType

2019-10-16 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6904:
---

 Summary: [Python] Implement MapArray and MapType
 Key: ARROW-6904
 URL: https://issues.apache.org/jira/browse/ARROW-6904
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 1.0.0


Map arrays are already added to C++, need to expose them in the Python API also



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6790) [Release] Automatically disable integration test cases in release verification

2019-10-03 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6790:
---

 Summary: [Release] Automatically disable integration test cases in 
release verification
 Key: ARROW-6790
 URL: https://issues.apache.org/jira/browse/ARROW-6790
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Bryan Cutler
Assignee: Bryan Cutler


If dev/release/verify-release-candidate.sh is run with selective testing and 
includes integration tests, the selected implementations should be the only 
ones enabled when running the integration test portion. For example:

TEST_DEFAULT=0 \
TEST_CPP=1 \
TEST_JAVA=1 \
TEST_INTEGRATION=1 \
dev/release/verify-release-candidate.sh source 0.15.0 2

Should run integration only for C++ and Java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6652) [Python] to_pandas conversion removes timezone from type

2019-09-21 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6652:
---

 Summary: [Python] to_pandas conversion removes timezone from type
 Key: ARROW-6652
 URL: https://issues.apache.org/jira/browse/ARROW-6652
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler
 Fix For: 0.15.0


Calling {{to_pandas}} on a {{pyarrow.Array}} with a timezone aware timestamp 
type, removes the timezone in the resulting {{pandas.Series}}.

{code}
>>> import pyarrow as pa
>>> a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles'))
>>> a.to_pandas()
0   1970-01-01 00:00:00.01
dtype: datetime64[ns]
{code}

Previous behavior from 0.14.1 of converting a {{pyarrow.Column}} {{to_pandas}} 
retained the timezone.
{code}
In [4]: import pyarrow as pa 
   ...: a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles'))  
   ...: c = pa.Column.from_array('ts', a) 

In [5]: c.to_pandas()   
 
Out[5]: 
0   1969-12-31 16:00:00.01-08:00
Name: ts, dtype: datetime64[ns, America/Los_Angeles]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6534) [Java] Fix typos and spelling

2019-09-11 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6534:
---

 Summary: [Java] Fix typos and spelling
 Key: ARROW-6534
 URL: https://issues.apache.org/jira/browse/ARROW-6534
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.15.0


Fix typos and spelling, mostly in docs and tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (ARROW-6519) [Java] Use IPC continuation token to mark EOS

2019-09-10 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6519:
---

 Summary: [Java] Use IPC continuation token to mark EOS
 Key: ARROW-6519
 URL: https://issues.apache.org/jira/browse/ARROW-6519
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.15.0


For Arrow stream in non-legacy mode, the EOS identifier should be \{0x, 
0x}. This way, all bytes sent by the writer can be read.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (ARROW-6461) [Java] EchoServer can close socket before client has finished reading

2019-09-04 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6461:
---

 Summary: [Java] EchoServer can close socket before client has 
finished reading
 Key: ARROW-6461
 URL: https://issues.apache.org/jira/browse/ARROW-6461
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Bryan Cutler
 Fix For: 0.15.0


When the EchoServer finishes running the client connection, the socket is 
closed immediately. This causes a race condition and the client will fail with a
{noformat}
 SocketException: connection reset {noformat}
if it has not read all of the echoed batches.

This was consistently happening with the fix for ARROW-6315



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (ARROW-6215) [Java] RangeEqualVisitor does not properly compare ZeroVector

2019-08-12 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-6215:
---

 Summary: [Java] RangeEqualVisitor does not properly compare 
ZeroVector
 Key: ARROW-6215
 URL: https://issues.apache.org/jira/browse/ARROW-6215
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


ZeroVector.accept and RangeEqualVisitor always return True no matter what type 
of other vector is compared



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-5762) [Integration][JS] Integration Tests for MapType

2019-06-27 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-5762:
---

 Summary: [Integration][JS] Integration Tests for MapType
 Key: ARROW-5762
 URL: https://issues.apache.org/jira/browse/ARROW-5762
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, JavaScript
Reporter: Bryan Cutler


ARROW-1279 enabled integration tests for MapType between Java and C++, but 
JavaScript had to be disabled for the map case due to an error.  Once this is 
fixed, {{generate_map_case}} could be moved under {{generate_nested_case}} with 
the other nested types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5063) [Java] FlightClient should not create a child allocator

2019-03-28 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-5063:
---

 Summary: [Java] FlightClient should not create a child allocator
 Key: ARROW-5063
 URL: https://issues.apache.org/jira/browse/ARROW-5063
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler


I ran into a problem when testing out Flight using the ExampleFlightServer with 
InMemoryStore producer. 

A client will iterate over endpoints and locations to get the streams, and the 
example creates a new client for each location. The only way to close the 
allocator in the FlightClient is to close the FlightClient, which also closes 
the read channel.  If the location is the same for each FlightStream (as is the 
case for the InMemoryStore), then it seems like grpc will reuse the channel, so 
closing one read client will shutdown the channel and the remaining 
FlightStreams cannot be read.

If an allocator was created by the owner of the FlightClient, then the client 
would not need to close it and this problem would be avoided. I believe other 
Flight classes do not create child allocators either, so this change would be 
consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5062) Shade Java Guava dependency for Flight

2019-03-28 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-5062:
---

 Summary: Shade Java Guava dependency for Flight
 Key: ARROW-5062
 URL: https://issues.apache.org/jira/browse/ARROW-5062
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler


The Guava dependency in the Java Flight module can interfere if using Flight in 
an application that relies on an older version of Guava.  We can shade the 
usage in Flight to prevent this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5014) [Java] Fix typos in Flight module

2019-03-26 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-5014:
---

 Summary: [Java] Fix typos in Flight module
 Key: ARROW-5014
 URL: https://issues.apache.org/jira/browse/ARROW-5014
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4344) [Java] Further cleanup maven output

2019-01-23 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-4344:
---

 Summary: [Java] Further cleanup maven output
 Key: ARROW-4344
 URL: https://issues.apache.org/jira/browse/ARROW-4344
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Followup to ARROW-4180, I noticed a EchoServer logs info output that should be 
changed to debug. Also, upgrading the rat license check plugin will not output 
all files excluded, which ends up to be a large amount of output as it is done 
for every module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3588) [Java] checkstyle - fix license

2018-10-22 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3588:
---

 Summary: [Java] checkstyle - fix license
 Key: ARROW-3588
 URL: https://issues.apache.org/jira/browse/ARROW-3588
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Make header correspond to the defined Apache license in checkstyle.license



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3323) [Java] checkstyle - fix naming

2018-09-24 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3323:
---

 Summary: [Java] checkstyle - fix naming
 Key: ARROW-3323
 URL: https://issues.apache.org/jira/browse/ARROW-3323
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


enable naming rules



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3273) [Java] checkstyle - fix javadoc style

2018-09-19 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3273:
---

 Summary: [Java] checkstyle - fix javadoc style
 Key: ARROW-3273
 URL: https://issues.apache.org/jira/browse/ARROW-3273
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3272) [Java] Document deviations from Google Style

2018-09-19 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3272:
---

 Summary: [Java] Document deviations from Google Style
 Key: ARROW-3272
 URL: https://issues.apache.org/jira/browse/ARROW-3272
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Bryan Cutler






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3264) [Java] checkstyle - fix whitespace

2018-09-18 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3264:
---

 Summary: [Java] checkstyle - fix whitespace
 Key: ARROW-3264
 URL: https://issues.apache.org/jira/browse/ARROW-3264
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Fix remaining whitespace issues



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3171) [Java] checkstyle - fix line length and whitespace

2018-09-04 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3171:
---

 Summary: [Java] checkstyle - fix line length and whitespace
 Key: ARROW-3171
 URL: https://issues.apache.org/jira/browse/ARROW-3171
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3115) [Java] Style Checks - Fix import ordering

2018-08-24 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3115:
---

 Summary: [Java] Style Checks - Fix import ordering
 Key: ARROW-3115
 URL: https://issues.apache.org/jira/browse/ARROW-3115
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Fix import ordering according to checkstyle



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3111) [Java] Enable changing default logging level when running tests

2018-08-22 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-3111:
---

 Summary: [Java] Enable changing default logging level when running 
tests
 Key: ARROW-3111
 URL: https://issues.apache.org/jira/browse/ARROW-3111
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Currently tests use the logback logger which has a default level of DEBUG. We 
should provide a way to change this level so that CI can run a build without 
seeing DEBUG messages if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2923) [Doc] Add instructions for running Spark integration tests

2018-07-27 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2923:
---

 Summary: [Doc] Add instructions for running Spark integration tests
 Key: ARROW-2923
 URL: https://issues.apache.org/jira/browse/ARROW-2923
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Add instructions to dev/README for running Spark integration tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2914) [Integration] Add WindowPandasUDFTests to Spark Integration

2018-07-25 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2914:
---

 Summary: [Integration] Add WindowPandasUDFTests to Spark 
Integration
 Key: ARROW-2914
 URL: https://issues.apache.org/jira/browse/ARROW-2914
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.10.0


Add PySpark tests for WindowPandasUDFTests to Spark Integration tests,.

Also, run the docker image against current Arrow master with patched version of 
Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2742) [Python] Allow Table.from_batches to use Iterator of ArrowRecordBatches

2018-06-25 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2742:
---

 Summary: [Python] Allow Table.from_batches to use Iterator of 
ArrowRecordBatches
 Key: ARROW-2742
 URL: https://issues.apache.org/jira/browse/ARROW-2742
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Currently, pyarrow.Table.from_batches requires a list of record batches.  A 
simple change will allow this to accept an iterator, which could be useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2704) [Java] IPC stream handling should be more friendly to low level processing

2018-06-12 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2704:
---

 Summary: [Java] IPC stream handling should be more friendly to low 
level processing
 Key: ARROW-2704
 URL: https://issues.apache.org/jira/browse/ARROW-2704
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


With some minor adjustments, the Java IPC stream reader could be more friendly 
to allow for low level message processing.  By that I mean reading a stream and 
examining messages without necessarily having to load the Record Batch data.  
These include 

* Separate MessageChannelReader.readNextMessage to allow access to the buffer 
containing the message.
* MessageChannelReader input channel should be protected
* ArrowStreamWriter should make the message to end a stream static
* WriteChannel intToBytes could write to existing bytes or byte buffer instead 
of creating new array



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2645) [Java] ArrowStreamWriter accumulates DictionaryBatch ArrowBlocks

2018-05-29 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2645:
---

 Summary: [Java] ArrowStreamWriter accumulates DictionaryBatch 
ArrowBlocks
 Key: ARROW-2645
 URL: https://issues.apache.org/jira/browse/ARROW-2645
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


While reading the base method ensureStarted in ArrowStreamWriter accumulates 
Dictionary blocks.  This is used for ArrowFileWriter but not ArrowStreamWriter. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2432) [Python] from_pandas fails when converting decimals if contain None

2018-04-09 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2432:
---

 Summary: [Python] from_pandas fails when converting decimals if 
contain None
 Key: ARROW-2432
 URL: https://issues.apache.org/jira/browse/ARROW-2432
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Bryan Cutler


Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal

In [4]: s_dec
Out[4]:
0 3.14
1 None
dtype: object{code}

The above error is raised when specifying decimal type.  When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2380) Correct issues in numpy_to_arrow conversion routines

2018-04-02 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2380:
---

 Summary: Correct issues in numpy_to_arrow conversion routines
 Key: ARROW-2380
 URL: https://issues.apache.org/jira/browse/ARROW-2380
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Bryan Cutler
 Fix For: 0.10.0


Following the discussion at [https://github.com/apache/arrow/pull/1689,] there 
are a few issues with conversion of various types to arrow that are incorrect 
or could be improved:
 * PyBytes_GET_SIZE is being casted to the wrong type, for example 
{{const int32_t length = static_cast(PyBytes_GET_SIZE(obj));}}

 * Handle the possibility with the statement
{{builder->value_data_length() + length > kBinaryMemoryLimit}}
if length is larger than kBinaryMemoryLimit

 * Look into using common code for binary object conversion to avoid 
duplication, and allow support for bytes and bytearray objects in other places 
than numpy_to_arrow.  (possibly put in src/arrow/python/helpers.h)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2101) [Python] from_pandas reads 'str' types as binary Arrow data with Python 2

2018-02-06 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-2101:
---

 Summary: [Python] from_pandas reads 'str' types as binary Arrow 
data with Python 2
 Key: ARROW-2101
 URL: https://issues.apache.org/jira/browse/ARROW-2101
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Bryan Cutler


Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
data of binary type, even if the user supplies type information.  conversion of 
'unicode' type works to create Arrow data of string types.  For example

{code}
In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
Out[25]: DataType(binary)

In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
Out[26]: DataType(binary)

In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
Out[27]: DataType(string)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-1962) [Java] Add reset() to ValueVector interface

2018-01-02 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1962:
---

 Summary: [Java] Add reset() to ValueVector interface
 Key: ARROW-1962
 URL: https://issues.apache.org/jira/browse/ARROW-1962
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


The {{reset()}} method exists in some ValueVectors but not all.  Its meaning is 
that it will bring the vector to an empty state, but not release any buffers 
(as opposed to clear() which resets and releases buffers).

It should be added to the {{ValueVector}} interface and implemented in the 
vector hierarchy where it currently is not. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1948) [Java] ListVector does not handle ipc with all non-null values with none set

2017-12-23 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1948:
---

 Summary: [Java] ListVector does not handle ipc with all non-null 
values with none set
 Key: ARROW-1948
 URL: https://issues.apache.org/jira/browse/ARROW-1948
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Affects Versions: 0.8.0
Reporter: Bryan Cutler
Assignee: Bryan Cutler


It is valid for ipc to send a validity buffer with no values set that indicate 
all values are non-null.  This is already handled by all vectors except 
ListVector, which will throw an invalid index exception when this is the case 
because it does not build the validity buffer with all elements set. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1915) [Python] Parquet tests should be optional

2017-12-12 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1915:
---

 Summary: [Python] Parquet tests should be optional
 Key: ARROW-1915
 URL: https://issues.apache.org/jira/browse/ARROW-1915
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler
Assignee: Bryan Cutler
Priority: Trivial


Two decimal tests in {{test_parquet.py}} are missing the @parquet decorator to 
allow skipping if parquet is not install, resulting in failure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1906) [Python] Creating a pyarrow.Array with timestamp of different unit is not casted

2017-12-08 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1906:
---

 Summary: [Python] Creating a pyarrow.Array with timestamp of 
different unit is not casted
 Key: ARROW-1906
 URL: https://issues.apache.org/jira/browse/ARROW-1906
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


This is similar to ARROW-1680 but slightly different in that an error is not 
raised but the unit will still remain unchanged only when using a timezone

{noformat}
In [47]: us_with_tz = pa.timestamp('us', tz='America/New_York')

In [48]: s = pd.Series([val])

In [49]: s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York')

In [50]: arr = pa.Array.from_pandas(s_nyc, type=us_with_tz)

In [51]: arr.type
Out[51]: TimestampType(timestamp[ns, tz=America/New_York])

In [52]: arr2 = pa.Array.from_pandas(s, type=pa.timestamp('us'))

In [53]: arr2.type
Out[53]: TimestampType(timestamp[us])
{noformat}

There is an easy workaround to apply the cast after creating the pyarrow.Array, 
which seems to work fine

{noformat}
In [54]: arr = pa.Array.from_pandas(s_nyc).cast(us_with_tz, safe=False)

In [55]: arr.type
Out[55]: TimestampType(timestamp[us, tz=America/New_York])
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1868) [Java] Change vector getMinorType to use MinorType instead of Types.MinorType

2017-11-28 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1868:
---

 Summary: [Java] Change vector getMinorType to use MinorType 
instead of Types.MinorType
 Key: ARROW-1868
 URL: https://issues.apache.org/jira/browse/ARROW-1868
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java - Vectors
Reporter: Bryan Cutler


This is just some renaming to clean things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1867) [Java] Add BitVector APIs from old vector class

2017-11-28 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1867:
---

 Summary: [Java] Add BitVector APIs from old vector class
 Key: ARROW-1867
 URL: https://issues.apache.org/jira/browse/ARROW-1867
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java - Vectors
Reporter: Bryan Cutler


The new BitVector class after the refactoring does not have some of the APIs 
from the previous class such as {{setRangeToOnes}}, etc.  Also, I believe 
{{getNullCount}} returned the number of zeros in the vector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1866) [JAVA] Combine MapVector and NonNullableMapVector Classes

2017-11-28 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1866:
---

 Summary: [JAVA] Combine MapVector and NonNullableMapVector Classes
 Key: ARROW-1866
 URL: https://issues.apache.org/jira/browse/ARROW-1866
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


{{NonNullableMapVector}} class can be merged into {{MapVector}} and removed as 
part of removing the non nullable vectors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1818) Examine Java Dependencies

2017-11-15 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1818:
---

 Summary: Examine Java Dependencies
 Key: ARROW-1818
 URL: https://issues.apache.org/jira/browse/ARROW-1818
 Project: Apache Arrow
  Issue Type: Task
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.8.0


While integrating the latest Arrow Java with Spark master, I noticed some 
possible binary incompatibilities with dependencies.  I'd like to examine these 
a little closer and make sure there are no problems before 0.8 is cut.

{noformat}
 Found version conflict(s) in library dependencies; some are suspected to be 
binary incompatible:
[warn] 
[warn]  * com.google.code.findbugs:jsr305:3.0.2 is selected over {1.3.9, 3.0.0}
[warn]  +- org.apache.arrow:arrow-vector:0.8.0-SNAPSHOT   (depends on 
3.0.2)
[warn]  +- org.apache.arrow:arrow-memory:0.8.0-SNAPSHOT   (depends on 
3.0.2)
[warn]  +- org.apache.hadoop:hadoop-common:2.7.3  (depends on 
3.0.0)
[warn]  +- org.apache.spark:spark-unsafe_2.11:2.3.0-SNAPSHOT  (depends on 
1.3.9)
[warn]  +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 
1.3.9)
[warn]  +- org.apache.spark:spark-network-common_2.11:2.3.0-SNAPSHOT 
(depends on 1.3.9)
[warn] 
[warn]  * io.netty:netty:3.9.9.Final is selected over {3.6.2.Final, 3.7.0.Final}
[warn]  +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 
3.9.9.Final)
[warn]  +- org.apache.hadoop:hadoop-hdfs:2.7.3(depends on 
3.6.2.Final)
[warn]  +- org.apache.zookeeper:zookeeper:3.4.6   (depends on 
3.6.2.Final)
[warn] 
[warn]  * io.netty:netty-all:4.0.47.Final is selected over 4.0.23.Final
[warn]  +- org.apache.hadoop:hadoop-hdfs:2.7.3(depends on 
4.0.23.Final)
[warn]  +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 
4.0.23.Final)
[warn]  +- org.apache.spark:spark-network-common_2.11:2.3.0-SNAPSHOT 
(depends on 4.0.23.Final)
[warn] 
[warn]  * commons-net:commons-net:3.1 is selected over 2.2
[warn]  +- org.apache.hadoop:hadoop-common:2.7.3  (depends on 
3.1)
[warn]  +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 
2.2)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1817) Configure JsonFileReader to read NaN for floats

2017-11-15 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1817:
---

 Summary: Configure JsonFileReader to read NaN for floats
 Key: ARROW-1817
 URL: https://issues.apache.org/jira/browse/ARROW-1817
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.8.0


There is a Spark unit test that includes reading JSON floating point values 
that are NaN's (validity bit is set).  The Jackson parser in Arrow version 0.4 
allowed for these by default, but looks like the updated version requires the 
conf {{ALLOW_NON_NUMERIC_NUMBERS}} to allow this. 

https://fasterxml.github.io/jackson-core/javadoc/2.2.0/com/fasterxml/jackson/core/JsonParser.Feature.html#ALLOW_NON_NUMERIC_NUMBERS



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1718) [Python] Creating a pyarrow.Array of date type from pandas causes error

2017-10-23 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1718:
---

 Summary: [Python] Creating a pyarrow.Array of date type from 
pandas causes error
 Key: ARROW-1718
 URL: https://issues.apache.org/jira/browse/ARROW-1718
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When calling {Array.from_pandas} with a pandas.Series of dates and specifying 
the desired pyarrow type, an error occurs.  If the type is not specified then 
{from_pandas} will interpret the data as a timestamp type.

{code}
import pandas as pd
import pyarrow as pa
import datetime

arr = pa.array([datetime.date(2017, 10, 23)])
c = pa.Column.from_array("d", arr)

s = c.to_pandas()
print(s)
# 0   2017-10-23
# Name: d, dtype: datetime64[ns]

result = pa.Array.from_pandas(s, type=pa.date32())
print(result)
"""
Traceback (most recent call last):
  File "", line 1, in 
  File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
  File 
"/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
 line 28, in array_format
values.append(value_format(x, 0))
  File 
"/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
 line 49, in value_format
return repr(x)
  File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
  File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
ValueError: year is out of range
"""
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1680) [Python] Timestamp unit change not done in from_pandas() conversion

2017-10-17 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1680:
---

 Summary: [Python] Timestamp unit change not done in from_pandas() 
conversion
 Key: ARROW-1680
 URL: https://issues.apache.org/jira/browse/ARROW-1680
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When calling {{Array.from_pandas}} with a pandas.Series of timestamps that have 
'ns' unit and specifying a type to coerce to with 'us' causes problems.  When 
the series has timestamps with a timezone, the unit is ignored.  When the 
series does not have a timezone, it is applied but causes an OverflowError when 
printing.

{noformat}
>>> import pandas as pd
>>> import pyarrow as pa
>>> from datetime import datetime
>>> s = pd.Series([datetime.now()])
>>> s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York')
>>> arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us', 
>>> tz='America/New_York'))
>>> arr.type
TimestampType(timestamp[ns, tz=America/New_York])
>>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
>>> arr.type
TimestampType(timestamp[us])
>>> print(arr)
Traceback (most recent call last):
  File "", line 1, in 
  File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
values = array_format(self, window=10)
  File "pyarrow/formatting.py", line 28, in array_format
values.append(value_format(x, 0))
  File "pyarrow/formatting.py", line 49, in value_format
return repr(x)
  File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
return repr(self.as_py())
  File "pyarrow/scalar.pxi", line 240, in pyarrow.lib.TimestampValue.as_py 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:21600)
return converter(value, tzinfo=tzinfo)
  File "pyarrow/scalar.pxi", line 204, in pyarrow.lib.lambda5 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7295)
TimeUnit_MICRO: lambda x, tzinfo: pd.Timestamp(
  File "pandas/_libs/tslib.pyx", line 402, in 
pandas._libs.tslib.Timestamp.__new__ (pandas/_libs/tslib.c:10051)
  File "pandas/_libs/tslib.pyx", line 1467, in 
pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:27665)
OverflowError: Python int too large to convert to C long
{noformat}

A workaround is to manually change values with astype
{noformat}
>>> arr = pa.Array.from_pandas(s.values.astype('datetime64[us]'))
>>> arr.type
TimestampType(timestamp[us])
>>> print(arr)

[
  Timestamp('2017-10-17 11:04:44.308233')
]
>>> 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1619) [Java] Correctly set "lastSet" for variable vectors in JsonReader

2017-09-27 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1619:
---

 Summary: [Java] Correctly set "lastSet" for variable vectors in 
JsonReader
 Key: ARROW-1619
 URL: https://issues.apache.org/jira/browse/ARROW-1619
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


The Arrow Java JsonFileReader does not correctly set "lastSet" in 
VariableWidthVectors which makes reading inner vectors overly complicated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1613) [Java] ArrowReader should not close the input ReadChannel

2017-09-26 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1613:
---

 Summary: [Java] ArrowReader should not close the input ReadChannel
 Key: ARROW-1613
 URL: https://issues.apache.org/jira/browse/ARROW-1613
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Currently, {{ArrowReader.close()}} will close resources (VectorSchemaRoot and 
Dictionary Vectors) and also close the input ReadChannel, or InputStream for 
ArrowStreamReader.  Closing of the ReadChannel should be done by what ever 
created it because it might need to be reused.

If this not possible, an alternative could be to add a method 
{{ArrowReader.end()}} that will close resources but not the ReadChannel.  Then 
{{end()}} could be called instead of {{close()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1361) [Java] Add minor type param accessors to NullableValueVectors

2017-08-16 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1361:
---

 Summary: [Java] Add minor type param accessors to 
NullableValueVectors
 Key: ARROW-1361
 URL: https://issues.apache.org/jira/browse/ARROW-1361
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


A {{NullableValueVector}} creates private copies of each param in the minor 
type, but does not have any way public api to access them.  So if given a 
{{NullableValueVector}} you would have to use the {{Field}} and cast to the 
correct type.  For example, with a {{NullableTimeStampMicroTZVector}} and 
trying to get the timezone:

{noformat}
if field.getType.isInstanceOf[ArrowType.Timestamp] &&
  field.getType.asInstanceOf[ArrowType.Timestamp].getTimezone
{noformat}

It would be more convenient to have direct accessors for these type params.  
Also, it is possible to do some minor refactoring because 
{{NullableValueVectors}} does not use these type params, so there is no need to 
store them.  They already exist in the inner vector object and the Field type.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1352) [Integration] Improve print formatting for producer, consumer line

2017-08-14 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1352:
---

 Summary: [Integration] Improve print formatting for producer, 
consumer line
 Key: ARROW-1352
 URL: https://issues.apache.org/jira/browse/ARROW-1352
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Bryan Cutler
Assignee: Bryan Cutler
Priority: Trivial


When running integration tests, the line indicating producer/consumer in the 
output gets jumbled with the rest.  This should be different to allow easier 
visual inspection on the producers/consumers run.  Here is some of the current 
output as it changes producer/consumer

{noformat}
==
Testing file /tmp/tmpso6golfs/generated_dictionary.json
==
-- Creating binary inputs
-- Validating file
-- Validating stream
-- Java producing, Java consuming
==
Testing file /home/bryan/git/arrow/integration/data/struct_example.json
==
-- Creating binary inputs
-- Validating file
-- Validating stream
{noformat}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1283) [Java] VectorSchemaRoot should be able to be closed() more than once

2017-07-26 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1283:
---

 Summary: [Java] VectorSchemaRoot should be able to be closed() 
more than once
 Key: ARROW-1283
 URL: https://issues.apache.org/jira/browse/ARROW-1283
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


When working with a VectorSchemaRoot, once it is no longer needed it resources 
are freed by calling {{close()}} followed by then closing the allocator.  
Sometimes it is needed to close a second time due to complex operations.  If 
the VectorSchemaRoot is closed again after the allocator, it raises an 
assertion error during {{clear()}} because it is trying to allocate an empty 
buffer.  The {{close()}} operation should mean that the object is no longer to 
be used, so this empty buffer is not needed and ends up being destroyed 
immediately anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1268) [Website] Blog post on Arrow integration with Spark

2017-07-25 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1268:
---

 Summary: [Website] Blog post on Arrow integration with Spark
 Key: ARROW-1268
 URL: https://issues.apache.org/jira/browse/ARROW-1268
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Website
Reporter: Bryan Cutler
Assignee: Bryan Cutler
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1245) [Integration] Java Integration Tests Disabled

2017-07-20 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1245:
---

 Summary: [Integration] Java Integration Tests Disabled
 Key: ARROW-1245
 URL: https://issues.apache.org/jira/browse/ARROW-1245
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Bryan Cutler
Assignee: Bryan Cutler


JavaTester in Integration tests is commented out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1238) [Java] Add JSON read/write support for decimals for integration tests

2017-07-19 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1238:
---

 Summary: [Java] Add JSON read/write support for decimals for 
integration tests
 Key: ARROW-1238
 URL: https://issues.apache.org/jira/browse/ARROW-1238
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1232) [Java] Decouple ArrowStreamReader from specific data input

2017-07-17 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1232:
---

 Summary: [Java] Decouple ArrowStreamReader from specific data input
 Key: ARROW-1232
 URL: https://issues.apache.org/jira/browse/ARROW-1232
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler


Currently, the ArrowStreamReader must be constructed with a channel/stream to 
read data from.  It would be better to use an abstraction that would decouple a 
specific input stream from the incoming messages.  This is following the 
discussion from https://github.com/apache/arrow/pull/839



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1184) [Java] Dictionary.equals is not working correctly

2017-07-04 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1184:
---

 Summary: [Java] Dictionary.equals is not working correctly
 Key: ARROW-1184
 URL: https://issues.apache.org/jira/browse/ARROW-1184
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Bryan Cutler


The {{Dictionary.equals}} method does not return True when the dictionaries are 
equal.  This is because {{equals}} is not implemented for FieldVector and so 
that comparison defaults to comparing the two objects only and not the vector 
data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1181) [Python] Parquet test fail if not enabled

2017-07-03 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1181:
---

 Summary: [Python] Parquet test fail if not enabled
 Key: ARROW-1181
 URL: https://issues.apache.org/jira/browse/ARROW-1181
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler
Assignee: Bryan Cutler
Priority: Trivial


test test_multiindex_duplicate_values fails if parquet not installed, I believe 
it just needs the @parquet annotation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1053) [Python] Memory leak with RecordBatchFileReader

2017-05-18 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-1053:
---

 Summary: [Python] Memory leak with RecordBatchFileReader
 Key: ARROW-1053
 URL: https://issues.apache.org/jira/browse/ARROW-1053
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


While working on SPARK-13534 and running repeated calls to {{toPandas}}, memory 
usage continues to climb and I isolated to the Python side.  The following code 
reproduces the issue, which looks like a memory leak.  Commenting out the block 
with the {{RecordBatchFileReader}} while leaving the writer, memory usage is 
stable, so I believe the issue is with the reader.

{noformat}
import pyarrow as pa
import numpy as np
import memory_profiler
import gc
import io


def leak():
data = [pa.array(np.concatenate([np.random.randn(10)] * 10))]
table = pa.Table.from_arrays(data, ['foo'])
while True:
print('calling to_pandas')
print('memory_usage: {0}'.format(memory_profiler.memory_usage()))
df = table.to_pandas()

batch = pa.RecordBatch.from_pandas(df)

sink = io.BytesIO()
writer = pa.RecordBatchFileWriter(sink, batch.schema)
writer.write_batch(batch)
writer.close()

reader = pa.open_file(pa.BufferReader(sink.getvalue()))
reader.read_all()

gc.collect()

leak()
{noformat}

Some of the output from the code above:
{noformat}
calling to_pandas
memory_usage: [67.0546875]
calling to_pandas
memory_usage: [143.95703125]
calling to_pandas
memory_usage: [151.58984375]
calling to_pandas
memory_usage: [174.453125]
calling to_pandas
memory_usage: [189.84765625]
calling to_pandas
memory_usage: [212.7109375]
calling to_pandas
memory_usage: [228.046875]
calling to_pandas
memory_usage: [243.109375]
calling to_pandas
memory_usage: [258.4375]
calling to_pandas
memory_usage: [273.83203125]
calling to_pandas
memory_usage: [288.90234375]
calling to_pandas
memory_usage: [304.23046875]
calling to_pandas
memory_usage: [319.625]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (ARROW-701) [Java] Support additional Date metadata

2017-03-22 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-701:
--

 Summary: [Java] Support additional Date metadata
 Key: ARROW-701
 URL: https://issues.apache.org/jira/browse/ARROW-701
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


The Date type format from ARROW-316 introduced a DateUnit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (ARROW-611) [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width

2017-03-21 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-611.

Resolution: Not A Problem

Fixed in ARROW-673

> [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
> -
>
> Key: ARROW-611
> URL: https://issues.apache.org/jira/browse/ARROW-611
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Affects Versions: 0.2.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-615) Move ByteArrayReadableSeekableByteChannel to vector.util package

2017-03-10 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905880#comment-15905880
 ] 

Bryan Cutler commented on ARROW-615:


PR: https://github.com/apache/arrow/pull/370

> Move ByteArrayReadableSeekableByteChannel to vector.util package
> 
>
> Key: ARROW-615
> URL: https://issues.apache.org/jira/browse/ARROW-615
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>
> The ByteArrayReadableSeekableByteChannel is useful when reading an 
> ArrowRecordBatch from a byte array with ArrowReader.  Currently it is 
> vector.file test package, this is proposing to move to 
> src/main/java/o.a.a.vector.util



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (ARROW-615) Move ByteArrayReadableSeekableByteChannel to vector.util package

2017-03-10 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-615:
--

Assignee: Bryan Cutler

> Move ByteArrayReadableSeekableByteChannel to vector.util package
> 
>
> Key: ARROW-615
> URL: https://issues.apache.org/jira/browse/ARROW-615
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>
> The ByteArrayReadableSeekableByteChannel is useful when reading an 
> ArrowRecordBatch from a byte array with ArrowReader.  Currently it is 
> vector.file test package, this is proposing to move to 
> src/main/java/o.a.a.vector.util



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (ARROW-615) Move ByteArrayReadableSeekableByteChannel to vector.util package

2017-03-10 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-615:
--

 Summary: Move ByteArrayReadableSeekableByteChannel to vector.util 
package
 Key: ARROW-615
 URL: https://issues.apache.org/jira/browse/ARROW-615
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Bryan Cutler
Priority: Minor


The ByteArrayReadableSeekableByteChannel is useful when reading an 
ArrowRecordBatch from a byte array with ArrowReader.  Currently it is 
vector.file test package, this is proposing to move to 
src/main/java/o.a.a.vector.util



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-612) [Java] Field toString should show nullable flag status

2017-03-10 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905845#comment-15905845
 ] 

Bryan Cutler commented on ARROW-612:


PR: https://github.com/apache/arrow/pull/368

> [Java] Field toString should show nullable flag status
> --
>
> Key: ARROW-612
> URL: https://issues.apache.org/jira/browse/ARROW-612
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Trivial
>
> Often when comparing Schemas, I'll see an error message like below because of 
> differing schemas.  The only difference is one is nullable and one is not, 
> but that info is not printed in the {{Field.toString}} method.
> {noformat}
>  - numeric type conversion *** FAILED *** (118 milliseconds)
> [info]   java.lang.IllegalArgumentException: Different schemas:
> [info] Schema
> [info] Schema
> [info]   at 
> org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43)
> {noformat}
> I would be nice to match the C++ {{Field.toString}} that prints " not null " 
> only if the nullable flag is not set.  Which would then look like this
> {noformat}
> [info] Schema
> [info] Schema
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-612) [Java] Field toString should show nullable flag status

2017-03-10 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905812#comment-15905812
 ] 

Bryan Cutler commented on ARROW-612:


I'll post a patch

> [Java] Field toString should show nullable flag status
> --
>
> Key: ARROW-612
> URL: https://issues.apache.org/jira/browse/ARROW-612
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Trivial
>
> Often when comparing Schemas, I'll see an error message like below because of 
> differing schemas.  The only difference is one is nullable and one is not, 
> but that info is not printed in the {{Field.toString}} method.
> {noformat}
>  - numeric type conversion *** FAILED *** (118 milliseconds)
> [info]   java.lang.IllegalArgumentException: Different schemas:
> [info] Schema
> [info] Schema
> [info]   at 
> org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43)
> {noformat}
> I would be nice to match the C++ {{Field.toString}} that prints " not null " 
> only if the nullable flag is not set.  Which would then look like this
> {noformat}
> [info] Schema
> [info] Schema
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (ARROW-612) [Java] Field toString should show nullable flag status

2017-03-10 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-612:
--

 Summary: [Java] Field toString should show nullable flag status
 Key: ARROW-612
 URL: https://issues.apache.org/jira/browse/ARROW-612
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler
Priority: Trivial


Often when comparing Schemas, I'll see an error message like below because of 
differing schemas.  The only difference is one is nullable and one is not, but 
that info is not printed in the {{Field.toString}} method.

{noformat}
 - numeric type conversion *** FAILED *** (118 milliseconds)
[info]   java.lang.IllegalArgumentException: Different schemas:
[info] Schema
[info] Schema
[info]   at 
org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43)
{noformat}

I would be nice to match the C++ {{Field.toString}} that prints " not null " 
only if the nullable flag is not set.  Which would then look like this
{noformat}
[info] Schema
[info] Schema
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (ARROW-611) [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width

2017-03-10 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-611:
---
Summary: [Java] TimeVector TypeLayout is incorrectly specified as 64 bit 
width  (was: TimeVector TypeLayout is incorrectly specified as 64 bit width)

> [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
> -
>
> Key: ARROW-611
> URL: https://issues.apache.org/jira/browse/ARROW-611
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Affects Versions: 0.2.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-611) TimeVector TypeLayout is incorrectly specified as 64 bit width

2017-03-10 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905771#comment-15905771
 ] 

Bryan Cutler commented on ARROW-611:


I can submit a patch for this

> TimeVector TypeLayout is incorrectly specified as 64 bit width
> --
>
> Key: ARROW-611
> URL: https://issues.apache.org/jira/browse/ARROW-611
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Affects Versions: 0.2.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-582) [Java] Add Date/Time Support to JSON File

2017-03-10 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905765#comment-15905765
 ] 

Bryan Cutler commented on ARROW-582:


PR: https://github.com/apache/arrow/pull/366

> [Java] Add Date/Time Support to JSON File
> -
>
> Key: ARROW-582
> URL: https://issues.apache.org/jira/browse/ARROW-582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Affects Versions: 0.2.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>
> Need to add Date/Time support to JsonFileReader/Writer for the purpose of 
> integration testing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-413) DATE type is not specified clearly

2017-03-09 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903535#comment-15903535
 ] 

Bryan Cutler commented on ARROW-413:


I started working on ARROW-582, to add Date/Time to JSON files, so I thought I 
would bump this to see if a conclusion has been reached?

> DATE type is not specified clearly
> --
>
> Key: ARROW-413
> URL: https://issues.apache.org/jira/browse/ARROW-413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Affects Versions: 0.1.0
>Reporter: Uwe L. Korn
>
> Currently the DATE type is not specified anywhere and needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (ARROW-582) [Java] Add Date/Time Support to JSON File

2017-02-24 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-582:
--

 Summary: [Java] Add Date/Time Support to JSON File
 Key: ARROW-582
 URL: https://issues.apache.org/jira/browse/ARROW-582
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Affects Versions: 0.2.0
Reporter: Bryan Cutler


Need to add Date/Time support to JsonFileReader/Writer for the purpose of 
integration testing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-556) [Integration] Can not run Integration tests if different cpp build path

2017-02-13 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864455#comment-15864455
 ] 

Bryan Cutler commented on ARROW-556:


Not a blocker, there are still ways you can run integration testing

> [Integration] Can not run Integration tests if different cpp build path
> ---
>
> Key: ARROW-556
> URL: https://issues.apache.org/jira/browse/ARROW-556
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bryan Cutler
>Assignee: Wes McKinney
>Priority: Minor
>
> Instructions to run integration tests say to specify the cpp build path and 
> then export an env var ARROW_CPP_TESTER relative to that build path.  The 
> problem is 2 other vars, STREAM_TO_FILE and FILE_TO_STREAM also rely on the 
> build path which is made from ARROW_HOME and 'cpp/test-build/debug' and will 
> fail if that is not the build path used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (ARROW-556) [Integration] Can not run Integration tests if different cpp build path

2017-02-13 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-556:
--

 Summary: [Integration] Can not run Integration tests if different 
cpp build path
 Key: ARROW-556
 URL: https://issues.apache.org/jira/browse/ARROW-556
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Bryan Cutler
Priority: Minor


Instructions to run integration tests say to specify the cpp build path and 
then export an env var ARROW_CPP_TESTER relative to that build path.  The 
problem is 2 other vars, STREAM_TO_FILE and FILE_TO_STREAM also rely on the 
build path which is made from ARROW_HOME and 'cpp/test-build/debug' and will 
fail if that is not the build path used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-421) [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference to the parent PyBytes to avoid premature garbage collection issues

2017-01-10 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816632#comment-15816632
 ] 

Bryan Cutler commented on ARROW-421:


I thought I'd give this a shot, here is the 
[PR|https://github.com/apache/arrow/pull/278]

> [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a 
> reference to the parent PyBytes to avoid premature garbage collection issues
> 
>
> Key: ARROW-421
> URL: https://issues.apache.org/jira/browse/ARROW-421
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (ARROW-396) Python: Add pyarrow.schema.Schema.equals

2016-12-07 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-396:
--

Assignee: Bryan Cutler

> Python: Add pyarrow.schema.Schema.equals
> 
>
> Key: ARROW-396
> URL: https://issues.apache.org/jira/browse/ARROW-396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>
> Add method in pyarrow to check if 2 schemas are equal.  This exists in 
> Arrow-cpp, just need to call it from the python side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-409) Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead

2016-12-06 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726954#comment-15726954
 ] 

Bryan Cutler commented on ARROW-409:


PR: https://github.com/apache/arrow/pull/229

> Python: Change pyarrow.Table.dataframe_from_batches API to create Table 
> instead
> ---
>
> Key: ARROW-409
> URL: https://issues.apache.org/jira/browse/ARROW-409
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>
> As discussed in PR https://github.com/apache/arrow/pull/216 the pyarrow.Table 
> API to convert RecordBatches to pandas.DataFrame would be better/more 
> flexible as follows:
> {noformat}
> table = pa.Table.from_batches(batches)
> df = table.to_pandas()
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ARROW-409) Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead

2016-12-06 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-409:
--

 Summary: Python: Change pyarrow.Table.dataframe_from_batches API 
to create Table instead
 Key: ARROW-409
 URL: https://issues.apache.org/jira/browse/ARROW-409
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler
Priority: Minor


As discussed in PR https://github.com/apache/arrow/pull/216 the pyarrow.Table 
API to convert RecordBatches to pandas.DataFrame would be better/more flexible 
as follows:

{noformat}
table = pa.Table.from_batches(batches)
df = table.to_pandas()
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-396) Python: Add pyarrow.schema.Schema.equals

2016-11-30 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710161#comment-15710161
 ] 

Bryan Cutler commented on ARROW-396:


PR: https://github.com/apache/arrow/pull/221

> Python: Add pyarrow.schema.Schema.equals
> 
>
> Key: ARROW-396
> URL: https://issues.apache.org/jira/browse/ARROW-396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Minor
>
> Add method in pyarrow to check if 2 schemas are equal.  This exists in 
> Arrow-cpp, just need to call it from the python side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-396) Python: Add pyarrow.schema.Schema.equals

2016-11-30 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709672#comment-15709672
 ] 

Bryan Cutler commented on ARROW-396:


I can add this

> Python: Add pyarrow.schema.Schema.equals
> 
>
> Key: ARROW-396
> URL: https://issues.apache.org/jira/browse/ARROW-396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Minor
>
> Add method in pyarrow to check if 2 schemas are equal.  This exists in 
> Arrow-cpp, just need to call it from the python side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ARROW-396) Python: Add pyarrow.schema.Schema.equals

2016-11-30 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-396:
--

 Summary: Python: Add pyarrow.schema.Schema.equals
 Key: ARROW-396
 URL: https://issues.apache.org/jira/browse/ARROW-396
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler
Priority: Minor


Add method in pyarrow to check if 2 schemas are equal.  This exists in 
Arrow-cpp, just need to call it from the python side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-369) [Python] Add ability to convert multiple record batches at once to pandas

2016-11-27 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701155#comment-15701155
 ] 

Bryan Cutler commented on ARROW-369:


PR: https://github.com/apache/arrow/pull/216

> [Python] Add ability to convert multiple record batches at once to pandas
> -
>
> Key: ARROW-369
> URL: https://issues.apache.org/jira/browse/ARROW-369
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>  Labels: newbie
>
> Instead of only being able to only convert single single record batches and 
> tables that consist only of single ColumnChunks, we should also support the 
> construction of Pandas DataFrames from multiple RecordBatches. In the most 
> simple way, we would convert each batch to a Pandas DataFrame and then concat 
> them all together. A second (and preferred) implementation would extend the 
> C++ function {{ConvertColumnToPandas}} in 
> {{python/src/pyarrow/adapters/pandas.*}} to work on chunked columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-369) [Python] Add ability to convert multiple record batches at once to pandas

2016-11-17 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674374#comment-15674374
 ] 

Bryan Cutler commented on ARROW-369:


I could work on this if you don't mind.  I was already doing this using concat 
in some of my local testing, so I'll take a crack at the chunked columns 
implementation.

> [Python] Add ability to convert multiple record batches at once to pandas
> -
>
> Key: ARROW-369
> URL: https://issues.apache.org/jira/browse/ARROW-369
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>  Labels: newbie
>
> Instead of only being able to only convert single single record batches and 
> tables that consist only of single ColumnChunks, we should also support the 
> construction of Pandas DataFrames from multiple RecordBatches. In the most 
> simple way, we would convert each batch to a Pandas DataFrame and then concat 
> them all together. A second (and preferred) implementation would extend the 
> C++ function {{ConvertColumnToPandas}} in 
> {{python/src/pyarrow/adapters/pandas.*}} to work on chunked columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-368) Document use of LD_LIBRARY_PATH when using Python

2016-11-04 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637123#comment-15637123
 ] 

Bryan Cutler commented on ARROW-368:


PR: https://github.com/apache/arrow/pull/199

> Document use of LD_LIBRARY_PATH when using Python
> -
>
> Key: ARROW-368
> URL: https://issues.apache.org/jira/browse/ARROW-368
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: documentation
>
> Currently, docs instruct libarrow.so to be under $ARROW_HOME/lib but pyarrow 
> will need this location or will get an import error.  A note should be added 
> to Python README put this path in the LD_LIBRARY_PATH env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-368) Document use of LD_LIBRARY_PATH when using Python

2016-11-04 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637122#comment-15637122
 ] 

Bryan Cutler commented on ARROW-368:


[~xhochy] Great!  I was about to add it there also, but please do in your PR 
since your working on that now

> Document use of LD_LIBRARY_PATH when using Python
> -
>
> Key: ARROW-368
> URL: https://issues.apache.org/jira/browse/ARROW-368
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: documentation
>
> Currently, docs instruct libarrow.so to be under $ARROW_HOME/lib but pyarrow 
> will need this location or will get an import error.  A note should be added 
> to Python README put this path in the LD_LIBRARY_PATH env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ARROW-368) Document use of LD_LIBRARY_PATH when using Python

2016-11-04 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-368:
--

 Summary: Document use of LD_LIBRARY_PATH when using Python
 Key: ARROW-368
 URL: https://issues.apache.org/jira/browse/ARROW-368
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Bryan Cutler
Priority: Trivial


Currently, docs instruct libarrow.so to be under $ARROW_HOME/lib but pyarrow 
will need this location or will get an import error.  A note should be added to 
Python README put this path in the LD_LIBRARY_PATH env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

88 matches

Mail list logo