[jira] [Commented] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector

2019-06-21 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869421#comment-16869421
 ] 

lidavidm commented on ARROW-5658:
-

[~liaotian1005], generally UNKNOWN means that there was an uncaught server-side 
exception. Do you see any traceback in the server output? If not, then we need 
to log these things inside Flight.

> [JAVA] apache arrow-flight cannot send listvector 
> --
>
> Key: ARROW-5658
> URL: https://issues.apache.org/jira/browse/ARROW-5658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.13.0
> Environment: java8 arrow-java 0.13.0
>Reporter: luckily
>Priority: Major
> Attachments: ClientStart.java, ServerStart.java, pom.xml
>
>
> I can't transfer using apache arrow-flihgt. Contains listvector data. The 
> problem description is as follows:
> {quote} # I parse an xml file and convert it to an arrow format and finally 
> convert it to a parquet data format. The address of the .xml file data is url 
> [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
>  # I created a schema that uses listvector.
> code show as below:
> List list = 
> childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
> VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
>  # Parse the xml file to get the list data in "cd". Use api use listvector.
> `ListVector listVector = (ListVector) valueVectors;
> List columns = column.getColumns();
> Column column1 = columns.get(0);
> String name = column1.getId().toString();
> UnionListWriter writer = listVector.getWriter();
> Writer.allocate();
> For (int j = 0; j < column1.getColumns().size();j++) {
> writer.setPosition(j);
> writer.startList();
> Writer.list().startList();
> Column column2 = column1.getColumns().get(j);
> List> lst = (List String>>) ((Map) val).get(name);
> For (int k = 0; k < lst.size(); k++) {
> Map stringStringMap = lst.get(k);
> String value = 
> stringStringMap.get(column2.getId().toString());
> Switch (column2.getType()) {
> Case FLOAT:
> 
> Writer.list().float4().writeFloat4(stringConvertFloat(value));
> Break;
> Case BOOLEAN:
> 
> Writer.list().bit().writeBit(stringConvertBoolean(value));
> Break;
> Case DECIMAL:
> 
> Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
> Break;
> Case TIMESTAMP:
> 
> Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
> Break;
> Case INTEGER:
> Case BIGINT:
> 
> Writer.list().bigInt().writeBigInt(stringConvertLong(value));
> Break;
> Case VARCHAR:
> VarCharHolder varBinaryHolder = new 
> VarCharHolder();
> varBinaryHolder.start = 0;
> Byte[] bytes =value.getBytes();
> ArrowBuf buffer = 
> listVector.getAllocator().buffer(bytes.length);
> varBinaryHolder.buffer = buffer;
> buffer.writeBytes(bytes);
> varBinaryHolder.end=bytes.length;
> 
> Writer.list().varChar().write(varBinaryHolder);
> Break;
> Default:
> Throw new IllegalArgumentException(" error no 
> type !!");
> }
> }
> Writer.list().endList();
> writer.endList();
> }`
>  4. 
> After the write is complete, I will send to the arrow-flight server. server 
> code :
> {quote}
> {quote}@Override
> public Callable acceptPut(FlightStream flightStream) {
>  return () -> {
>  try (VectorSchemaRoot root = flightStream.getRoot()) {
>  while (flightStream.next()) {
>  VectorSchemaRoot other = null;
>  try {
>  logger.info(" Receive message .. size: " + root.getRowCount());
>  other = 

[jira] [Created] (ARROW-5681) [FlightRPC][Java] Wrap gRPC exceptions

2019-06-21 Thread lidavidm (JIRA)
lidavidm created ARROW-5681:
---

 Summary: [FlightRPC][Java] Wrap gRPC exceptions
 Key: ARROW-5681
 URL: https://issues.apache.org/jira/browse/ARROW-5681
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: lidavidm
Assignee: lidavidm
 Fix For: 1.0.0


Instead of requiring users to catch/throw StatusRuntimeException in Flight 
services/clients, and thereby leaking gRPC details, we should provide our own 
set of exceptions and status codes. This way, services can provide proper error 
messages and error codes to clients, which can catch the exception and respond 
properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5063) [Java] FlightClient should not create a child allocator

2019-06-21 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869736#comment-16869736
 ] 

lidavidm commented on ARROW-5063:
-

Since we couldn't reproduce the gRPC behavior, maybe I can rollback the change 
to the client and just keep the tests?

> [Java] FlightClient should not create a child allocator
> ---
>
> Key: ARROW-5063
> URL: https://issues.apache.org/jira/browse/ARROW-5063
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I ran into a problem when testing out Flight using the ExampleFlightServer 
> with InMemoryStore producer. 
> A client will iterate over endpoints and locations to get the streams, and 
> the example creates a new client for each location. The only way to close the 
> allocator in the FlightClient is to close the FlightClient, which also closes 
> the read channel.  If the location is the same for each FlightStream (as is 
> the case for the InMemoryStore), then it seems like grpc will reuse the 
> channel, so closing one read client will shutdown the channel and the 
> remaining FlightStreams cannot be read.
> If an allocator was created by the owner of the FlightClient, then the client 
> would not need to close it and this problem would be avoided. I believe other 
> Flight classes do not create child allocators either, so this change would be 
> consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5643) [Flight] Add ability to override hostname checking

2019-06-21 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5643:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Flight] Add ability to override hostname checking
> --
>
> Key: ARROW-5643
> URL: https://issues.apache.org/jira/browse/ARROW-5643
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: lidavidm
>Assignee: lidavidm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We should add the ability to override hostname checks, so you can connect to 
> localhost over TLS but still verify that the certificate is for some other 
> domain.
> Example: when deploying on Kubernetes with headless services, clients connect 
> directly to backend services and do load balancing themselves. Thus all 
> instances of an application must present a certificate for the same hostname. 
> To do health checks in such an environment, you can't connect to the TLS 
> hostname (which may resolve to a different instance); you need to connect to 
> localhost, and override the hostname check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket

2019-07-03 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm closed ARROW-5829.
---
Resolution: Duplicate
  Assignee: lidavidm

> [Java] failure in TestServerOptions.domainSocket
> 
>
> Key: ARROW-5829
> URL: https://issues.apache.org/jira/browse/ARROW-5829
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Pindikura Ravindra
>Assignee: lidavidm
>Priority: Major
>
> I see this consistently with the 0.14.0 rc0 release candidate on mac mojave.
> java.io.IOException: Failed to bind
>  at 
> org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46)
> Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: 
> Address already in use
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket

2019-07-03 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877880#comment-16877880
 ] 

lidavidm edited comment on ARROW-5829 at 7/3/19 2:21 PM:
-

I think this is the same underlying cause - on OSX, the release verification 
script uses a custom temp dir that makes the domain socket path too long. I'm 
going to close this in favor of ARROW-5836 to keep things in one place.


was (Author: lidavidm):
I think this is the same underlying cause - on OSX, the release verification 
script uses a custom temp dir that makes the domain socket path too long.

> [Java] failure in TestServerOptions.domainSocket
> 
>
> Key: ARROW-5829
> URL: https://issues.apache.org/jira/browse/ARROW-5829
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Pindikura Ravindra
>Priority: Major
>
> I see this consistently with the 0.14.0 rc0 release candidate on mac mojave.
> java.io.IOException: Failed to bind
>  at 
> org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46)
> Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: 
> Address already in use
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use

2019-07-03 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877875#comment-16877875
 ] 

lidavidm commented on ARROW-5836:
-

This is because the path for the domain socket is too long on OSX, which 
happens because the release verification script generates a custom TMPDIR. For 
instance, this is the path that it tries to use for me, and if I try to listen 
on a domain socket with netcat, I get the following:

{{$ nc -l -U  
/private/var/folders/tm/b4drxjmn7j79gtp0ppbw5qlhgn/T/arrow-0.14.0.X.qLFPG9EZ/apache-arrow-0.14.0/java/flight/target/flight-unit-test-8900940943285883708.sock}}
{{nc: File name too long}}

Not sure what the best way to fix this is - perhaps hardcode /tmp inside the 
test? I quickly scanned the grpc and grpc-java repositories, but they don't 
seem to test domain sockets (beyond maybe parsing the address).

> [Java][OSX] Flight tests are failing: address already in use
> 
>
> Key: ARROW-5836
> URL: https://issues.apache.org/jira/browse/ARROW-5836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Krisztian Szucs
>Priority: Major
>
> {code}
> Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 

[jira] [Assigned] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use

2019-07-03 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm reassigned ARROW-5836:
---

Assignee: lidavidm

> [Java][OSX] Flight tests are failing: address already in use
> 
>
> Key: ARROW-5836
> URL: https://issues.apache.org/jira/browse/ARROW-5836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Krisztian Szucs
>Assignee: lidavidm
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
> at 
> 

[jira] [Commented] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use

2019-07-03 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877877#comment-16877877
 ] 

lidavidm commented on ARROW-5836:
-

Ah wait, I did find their test case - they hardcode /tmp: 
https://github.com/grpc/grpc/blob/df998f70239ec80af7d9af7133f9c0757e952f39/test/core/end2end/fixtures/h2_local_uds.cc#L29-L38

> [Java][OSX] Flight tests are failing: address already in use
> 
>
> Key: ARROW-5836
> URL: https://issues.apache.org/jira/browse/ARROW-5836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Krisztian Szucs
>Priority: Major
>
> {code}
> Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183)
> at 
> 

[jira] [Comment Edited] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use

2019-07-03 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877877#comment-16877877
 ] 

lidavidm edited comment on ARROW-5836 at 7/3/19 2:18 PM:
-

Ah wait, I did find their test case - they hardcode /tmp: 
[https://github.com/grpc/grpc/blob/df998f70239ec80af7d9af7133f9c0757e952f39/test/core/end2end/fixtures/h2_local_uds.cc#L29-L38]

grpc-java doesn't seem to test this.


was (Author: lidavidm):
Ah wait, I did find their test case - they hardcode /tmp: 
https://github.com/grpc/grpc/blob/df998f70239ec80af7d9af7133f9c0757e952f39/test/core/end2end/fixtures/h2_local_uds.cc#L29-L38

> [Java][OSX] Flight tests are failing: address already in use
> 
>
> Key: ARROW-5836
> URL: https://issues.apache.org/jira/browse/ARROW-5836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Krisztian Szucs
>Priority: Major
>
> {code}
> Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251)
> at 
> io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390)
> at 
> io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA 
> frame for an unknown stream 3
> at 
> io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129)
> at 
> 

[jira] [Commented] (ARROW-5769) [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh

2019-06-27 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874585#comment-16874585
 ] 

lidavidm commented on ARROW-5769:
-

Actually looking at that again, it seems the environment variable is set and 
the path is right, but the file isn't there...are the submodules updated? (git 
submodule update)

> [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh
> --
>
> Key: ARROW-5769
> URL: https://issues.apache.org/jira/browse/ARROW-5769
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Java
>Reporter: Sutou Kouhei
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Details:
> {noformat}
> [INFO] [INFO] Running org.apache.arrow.flight.TestTls
> [INFO] [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time 
> elapsed: 0.005 s <<< FAILURE! - in org.apache.arrow.flight.TestTls
> [INFO] [ERROR] connectTls(org.apache.arrow.flight.TestTls)  Time elapsed: 
> 0.004 s  <<< ERROR!
> [INFO] java.lang.RuntimeException: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44)
> [INFO] Caused by: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44)
> [INFO] 
> [INFO] [ERROR] rejectInvalidCert(org.apache.arrow.flight.TestTls)  Time 
> elapsed: 0 s  <<< ERROR!
> [INFO] java.lang.Exception: Unexpected exception, 
> expected but was
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62)
> [INFO] Caused by: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62)
> [INFO] 
> [INFO] [ERROR] rejectHostname(org.apache.arrow.flight.TestTls)  Time elapsed: 
> 0.001 s  <<< ERROR!
> [INFO] java.lang.Exception: Unexpected exception, 
> expected but was
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78)
> [INFO] Caused by: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78)
> [INFO] 
> [INFO] [INFO] Running org.apache.arrow.flight.TestServerOptions
> [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 0.114 s - in org.apache.arrow.flight.TestServerOptions
> [INFO] [INFO] 
> [INFO] [INFO] Results:
> [INFO] [INFO] 
> [INFO] [ERROR] Errors: 
> [INFO] [ERROR]   TestTls.connectTls:44->test:98->lambda$test$3:105 Runtime 
> java.io.FileNotFound...
> [INFO] [ERROR]   TestTls.rejectHostname »  Unexpected exception, 
> expected [INFO] [ERROR]   TestTls.rejectInvalidCert »  Unexpected exception, 
> expected [INFO] [INFO] 
> [INFO] [ERROR] Tests run: 27, Failures: 0, Errors: 3, Skipped: 10
> {noformat}
> I'm not sure whether this is my environment problem or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5769) [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh

2019-06-27 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874581#comment-16874581
 ] 

lidavidm commented on ARROW-5769:
-

Don't have access to my work computer, but this is because the Flight tests now 
need ARROW_TEST_DATA. Adding a line under SOURCE_DIR in 00-prepare.sh like this 
should work, so long as submodules are initialized and updated:

{{export ARROW_TEST_DATA="${SOURCE_DIR}/../../testing/data"}}

> [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh
> --
>
> Key: ARROW-5769
> URL: https://issues.apache.org/jira/browse/ARROW-5769
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Java
>Reporter: Sutou Kouhei
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Details:
> {noformat}
> [INFO] [INFO] Running org.apache.arrow.flight.TestTls
> [INFO] [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time 
> elapsed: 0.005 s <<< FAILURE! - in org.apache.arrow.flight.TestTls
> [INFO] [ERROR] connectTls(org.apache.arrow.flight.TestTls)  Time elapsed: 
> 0.004 s  <<< ERROR!
> [INFO] java.lang.RuntimeException: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44)
> [INFO] Caused by: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44)
> [INFO] 
> [INFO] [ERROR] rejectInvalidCert(org.apache.arrow.flight.TestTls)  Time 
> elapsed: 0 s  <<< ERROR!
> [INFO] java.lang.Exception: Unexpected exception, 
> expected but was
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62)
> [INFO] Caused by: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62)
> [INFO] 
> [INFO] [ERROR] rejectHostname(org.apache.arrow.flight.TestTls)  Time elapsed: 
> 0.001 s  <<< ERROR!
> [INFO] java.lang.Exception: Unexpected exception, 
> expected but was
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78)
> [INFO] Caused by: java.io.FileNotFoundException: 
> /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem
>  (No such file or directory)
> [INFO]at 
> org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102)
> [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98)
> [INFO]at 
> org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78)
> [INFO] 
> [INFO] [INFO] Running org.apache.arrow.flight.TestServerOptions
> [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 0.114 s - in org.apache.arrow.flight.TestServerOptions
> [INFO] [INFO] 
> [INFO] [INFO] Results:
> [INFO] [INFO] 
> [INFO] [ERROR] Errors: 
> [INFO] [ERROR]   TestTls.connectTls:44->test:98->lambda$test$3:105 Runtime 
> java.io.FileNotFound...
> [INFO] [ERROR]   TestTls.rejectHostname »  Unexpected exception, 
> expected [INFO] [ERROR]   TestTls.rejectInvalidCert »  Unexpected exception, 
> expected [INFO] [INFO] 
> [INFO] [ERROR] Tests run: 27, Failures: 0, Errors: 3, Skipped: 10
> {noformat}
> I'm not sure whether this is my environment problem or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-10 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881990#comment-16881990
 ] 

lidavidm commented on ARROW-5610:
-

Right now, if you define an extension type in Java whose type name is not 
"arrow.py_extension_type", you have no way of writing the Python equivalent. I 
think what's needed is a C++ extension type whose implementation dispatches to 
Python callbacks, which can be instantiated and registered with an arbitrary 
name.

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-10 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881990#comment-16881990
 ] 

lidavidm edited comment on ARROW-5610 at 7/10/19 12:17 PM:
---

Right now, if you define an extension type in Java whose type name is not 
"arrow.py_extension_type", you have no way of writing the Python equivalent. I 
think what's needed is a C++ extension type whose implementation dispatches to 
Python callbacks, which can be instantiated and registered with an arbitrary 
name. Basically, what Joris suggests with the extension type that can be 
parameterized with a name.


was (Author: lidavidm):
Right now, if you define an extension type in Java whose type name is not 
"arrow.py_extension_type", you have no way of writing the Python equivalent. I 
think what's needed is a C++ extension type whose implementation dispatches to 
Python callbacks, which can be instantiated and registered with an arbitrary 
name.

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-10 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882039#comment-16882039
 ] 

lidavidm commented on ARROW-5610:
-

Hmm, to be frank, I haven't gotten a chance to evaluate the API yet, I'm just 
going off of reading the implementation. I'll follow up once I do get a chance 
to try it out. But I'm still not sure why the language that the type is defined 
in should matter - I thought the idea is there is an abstract type, and you 
implement it for each language, and right now the main limitation is that you 
can't implement a Python type with an arbitrary name. (i.e. I want a java 
UuidType, which uses java's UUID class, to map seamlessly to a Python UuidType 
using the uuid module).

But I suppose I should put up some code before I keep talking!

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-10 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881990#comment-16881990
 ] 

lidavidm edited comment on ARROW-5610 at 7/10/19 12:21 PM:
---

Right now, if you define an extension type in Java whose type name is not 
"arrow.py_extension_type", you have no way of writing the Python equivalent. I 
think what's needed is a C++ extension type whose implementation dispatches to 
Python callbacks, which can be instantiated and registered with an arbitrary 
name. Basically, what Joris suggests with the extension type that can be 
parameterized with a name.

I think the implementation would be similar to Flight, where you have a C++ 
subclass that contains a set of function pointers and a Python object, and 
invokes those functions by passing them the Python object and the C++ 
arguments. The functions would be defined in Cython and take care of bridging 
between the two.

I don't think there needs to be a Python-specific registry, just a way to hook 
arbitrary Python into the extension type metadata (de)serialization. Right now, 
the C++ subclass calls a specific classmethod that tries to unpickle the 
metadata, but there's no reason why it has to be pickle.


was (Author: lidavidm):
Right now, if you define an extension type in Java whose type name is not 
"arrow.py_extension_type", you have no way of writing the Python equivalent. I 
think what's needed is a C++ extension type whose implementation dispatches to 
Python callbacks, which can be instantiated and registered with an arbitrary 
name. Basically, what Joris suggests with the extension type that can be 
parameterized with a name.

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-10 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882027#comment-16882027
 ] 

lidavidm commented on ARROW-5610:
-

Say in Java, you have an extension type representing an IP address. Its type 
name is "ip" and its metadata indicates whether it's IPv4 or IPv6. You want to 
transfer a table containing a column of that type to and from Python. Right 
now, you can read that data from Python, but you can't create a table with that 
type. You could implement an extension type that behaves the same, but Java 
wouldn't recognize it, because the type name has to be 
"arrow.py_extension_type". You also can't deserialize the metadata written by 
Java or write metadata that Java can read, as it's not in pickle format.

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5930) [FlightRPC] [Python] Flight CI tests are failing

2019-07-12 Thread lidavidm (JIRA)
lidavidm created ARROW-5930:
---

 Summary: [FlightRPC] [Python] Flight CI tests are failing
 Key: ARROW-5930
 URL: https://issues.apache.org/jira/browse/ARROW-5930
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Python
Affects Versions: 0.14.0
Reporter: lidavidm


Flight tests segfault on Travis: 
[https://travis-ci.org/apache/arrow/jobs/557690959]

The relevant part is:
{noformat}
Fatal Python error: Aborted
Thread 0x7fcf009fe700 (most recent call first):
  File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", 
line 386 in _server_thread
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", 
line 864 in run
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", 
line 916 in _bootstrap_inner
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", 
line 884 in _bootstrap
Current thread 0x7fcf1f9fa700 (most recent call first):
  File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", 
line 411 in flight_server
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/contextlib.py", 
line 99 in __exit__
  File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", 
line 670 in test_tls_do_get
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py",
 line 165 in pytest_pyfunc_call
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
 line 187 in _multicall
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 81 in 
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 87 in _hookexec
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
 line 289 in __call__
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py",
 line 1451 in runtest
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 117 in pytest_runtest_call
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
 line 187 in _multicall
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 81 in 
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 87 in _hookexec
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
 line 289 in __call__
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 192 in 
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 220 in from_call
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 192 in call_runtest_hook
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 167 in call_and_report
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 87 in runtestprotocol
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
 line 72 in pytest_runtest_protocol
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
 line 187 in _multicall
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 81 in 
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 87 in _hookexec
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
 line 289 in __call__
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py",
 line 278 in pytest_runtestloop
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
 line 187 in _multicall
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 81 in 
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
 line 87 in _hookexec
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
 line 289 in __call__
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py",
 line 257 in _main
  File 
"/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py",
 

[jira] [Commented] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket

2019-07-03 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877880#comment-16877880
 ] 

lidavidm commented on ARROW-5829:
-

I think this is the same underlying cause - on OSX, the release verification 
script uses a custom temp dir that makes the domain socket path too long.

> [Java] failure in TestServerOptions.domainSocket
> 
>
> Key: ARROW-5829
> URL: https://issues.apache.org/jira/browse/ARROW-5829
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Reporter: Pindikura Ravindra
>Priority: Major
>
> I see this consistently with the 0.14.0 rc0 release candidate on mac mojave.
> java.io.IOException: Failed to bind
>  at 
> org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46)
> Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: 
> Address already in use
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket

2019-07-03 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5829:

Affects Version/s: 0.14.0

> [Java] failure in TestServerOptions.domainSocket
> 
>
> Key: ARROW-5829
> URL: https://issues.apache.org/jira/browse/ARROW-5829
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.14.0
>Reporter: Pindikura Ravindra
>Priority: Major
>
> I see this consistently with the 0.14.0 rc0 release candidate on mac mojave.
> java.io.IOException: Failed to bind
>  at 
> org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46)
> Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: 
> Address already in use
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5876) [FlightRPC] Implement basic auth across all languages

2019-07-08 Thread lidavidm (JIRA)
lidavidm created ARROW-5876:
---

 Summary: [FlightRPC] Implement basic auth across all languages
 Key: ARROW-5876
 URL: https://issues.apache.org/jira/browse/ARROW-5876
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Affects Versions: 0.14.0
Reporter: lidavidm


We should implement a set of common auth methods in Flight itself to have 
standardized ways to do things like basic auth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5875) [FlightRPC] Test RPC features in integration tests

2019-07-08 Thread lidavidm (JIRA)
lidavidm created ARROW-5875:
---

 Summary: [FlightRPC] Test RPC features in integration tests
 Key: ARROW-5875
 URL: https://issues.apache.org/jira/browse/ARROW-5875
 Project: Apache Arrow
  Issue Type: Test
  Components: FlightRPC, Integration
Affects Versions: 0.14.0
Reporter: lidavidm


We should test not just wire-format compatibility, but feature-compatibility in 
Flight integration tests. This may mean adding a separate suite of tests to the 
integration script.

Features that should be tested include:
 * Authentication
 * Error & error code propagation
 * Cancellation
 * Flow control/backpressure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5877) [FlightRPC] Document caveats around usage of auth APIs

2019-07-08 Thread lidavidm (JIRA)
lidavidm created ARROW-5877:
---

 Summary: [FlightRPC] Document caveats around usage of auth APIs
 Key: ARROW-5877
 URL: https://issues.apache.org/jira/browse/ARROW-5877
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: lidavidm


The Flight Handshake method can be insecure, and currently has a surprising 
failure mode; we should document these caveats (blocks forever waiting on 
client/server; insecure depending on deployment configuration)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-15 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885281#comment-16885281
 ] 

lidavidm commented on ARROW-5610:
-

[~wesmckinn] I'll try to take a pass this week, if time permits; we would like 
this functionality. (By the way, is there a Jira explicitly for being able to 
hook into to_pandas, or a suggested way to efficiently do a custom Pandas 
conversion?)

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5930) [FlightRPC] [Python] Flight CI tests are failing

2019-07-16 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886300#comment-16886300
 ] 

lidavidm commented on ARROW-5930:
-

[~pitrou] I had just come to the same conclusion. I have a change so that 
Shutdown doesn't use a DCHECK, but instead does an actual check, so at least it 
won't segfault. I can add additional synchronization on the Python side.

> [FlightRPC] [Python] Flight CI tests are failing
> 
>
> Key: ARROW-5930
> URL: https://issues.apache.org/jira/browse/ARROW-5930
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Python
>Affects Versions: 0.14.0
>Reporter: lidavidm
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Flight tests segfault on Travis: 
> [https://travis-ci.org/apache/arrow/jobs/557690959]
> The relevant part is:
> {noformat}
> Fatal Python error: Aborted
> Thread 0x7fcf009fe700 (most recent call first):
>   File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", 
> line 386 in _server_thread
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py",
>  line 864 in run
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py",
>  line 916 in _bootstrap_inner
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py",
>  line 884 in _bootstrap
> Current thread 0x7fcf1f9fa700 (most recent call first):
>   File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", 
> line 411 in flight_server
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/contextlib.py",
>  line 99 in __exit__
>   File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", 
> line 670 in test_tls_do_get
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py",
>  line 165 in pytest_pyfunc_call
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
>  line 187 in _multicall
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
>  line 81 in 
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
>  line 87 in _hookexec
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
>  line 289 in __call__
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py",
>  line 1451 in runtest
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 117 in pytest_runtest_call
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
>  line 187 in _multicall
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
>  line 81 in 
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
>  line 87 in _hookexec
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
>  line 289 in __call__
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 192 in 
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 220 in from_call
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 192 in call_runtest_hook
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 167 in call_and_report
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 87 in runtestprotocol
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py",
>  line 72 in pytest_runtest_protocol
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py",
>  line 187 in _multicall
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
>  line 81 in 
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py",
>  line 87 in _hookexec
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py",
>  line 289 in __call__
>   File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py",
>  line 278 in 

[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-08-12 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905428#comment-16905428
 ] 

lidavidm commented on ARROW-5610:
-

[~jorisvandenbossche] the approach makes sense to me. I assume the generic 
ExtensionType would have a Python "vtable" for Python subclasses to implement 
the C++ methods, and that each Python subclass would somehow register a new 
instance of the C++ type (with corresponding Python method references) with the 
extension type registry? The registration method would need to support 
parameterized types as well (i.e. registering multiple instances of the same 
type with different parameters).

There's still the reference loop between C++ and Python. In this case, since 
you have no way of re-instantiating the Python instance if the weak reference 
is dropped, you'd need some other way - so you might have to make the 
Python-side registry, as a way to get around the reference loop. (Then, during 
interpreter shutdown, you would drop all the C++ extension type instance 
references, then drop the Python references.)

I think then, on the C++ side, the generic extension type instance would get 
instantiated, but there would be no way to instantiate the corresponding Python 
class without a separate registry, as you mention. So the unknown extension 
type then comes into play. Alternatively, Python subclasses could be required 
to register a factory method that takes the extension type name and metadata.

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-08-02 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898956#comment-16898956
 ] 

lidavidm commented on ARROW-5610:
-

My apologies, I ended up being too busy to look at this.

Thanks for the issue pointers.

> [Python] Define extension type API in Python to "receive" or "send" a foreign 
> extension type
> 
>
> Key: ARROW-5610
> URL: https://issues.apache.org/jira/browse/ARROW-5610
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. 
> There will be cases where an extension type is coming from another 
> programming language (e.g. Java), so it would be useful to be able to "plug 
> in" a Python extension type subclass that will be used to deserialize the 
> extension type coming over the wire. This has some different API requirements 
> since the serialized representation of the type will not have knowledge of 
> Python pickling, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6241) [Java] Failures on master

2019-08-14 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907664#comment-16907664
 ] 

lidavidm commented on ARROW-6241:
-

These two commits merged cleanly but break each other:

[https://github.com/apache/arrow/commit/6eae79000336788925fab1f1c011146e24c4838d]
 introduced use of Preconditions

[https://github.com/apache/arrow/commit/c45def63963f5f70903e58492e22718cc9de6ed1]
 removed the import (as the change there made it unused)


 

> [Java] Failures on master
> -
>
> Key: ARROW-6241
> URL: https://issues.apache.org/jira/browse/ARROW-6241
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.15.0
>
>
> I'm getting builds failing today with errors like
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile 
> (default-compile) on project arrow-vector: Compilation failure: Compilation 
> failure:
> [ERROR] 
> /home/travis/build/apache/arrow/java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java:[356,4]
>  error: cannot find symbol
> [ERROR] symbol:   variable Preconditions
> [ERROR] location: class ListVector
> [ERROR] 
> /home/travis/build/apache/arrow/java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java:[96,4]
>  error: cannot find symbol
> [ERROR] symbol:   variable Preconditions
> [ERROR] location: class NonNullableStructVector
> [ERROR] -> [Help 1]
> {code}
> see https://travis-ci.org/apache/arrow/jobs/571958044
> Is this introduced by a recent patch?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-4914) [Rust] Array slice returns incorrect bitmask

2019-08-14 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-4914:

Labels: beginner  (was: )

> [Rust] Array slice returns incorrect bitmask
> 
>
> Key: ARROW-4914
> URL: https://issues.apache.org/jira/browse/ARROW-4914
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Neville Dipale
>Priority: Blocker
>  Labels: beginner
>
> Slicing arrays changes the offset, length and null count of their array data, 
> but the bitmask is not changed.
> This results in the correct null count, but the array values might be marked 
> incorrectly as valid/invalid based on the old bitmask positions before the 
> offset.
> To reproduce, create an array with some null values, slice the array, and 
> then dbg!() it (after downcasting).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-2001) [Java] Add getInitReservation() to BufferAllocator interface similar to getLimit(), getHeadRoom() APIs

2019-08-16 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-2001:

Labels: beginner newbie  (was: newbie)

> [Java] Add getInitReservation() to BufferAllocator interface similar to 
> getLimit(), getHeadRoom() APIs
> --
>
> Key: ARROW-2001
> URL: https://issues.apache.org/jira/browse/ARROW-2001
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Siddharth Teotia
>Priority: Minor
>  Labels: beginner, newbie
>
> For capturing additional information for debugging/profiling purposes, it 
> will be useful to expose the init reservation for buffer allocator. 
> I would encourage someone new to the community to do this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5822) [Java] Provide a sample json file for the flight example

2019-08-16 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5822:

Labels: beginner  (was: )

> [Java] Provide a sample json file for the flight example
> 
>
> Key: ARROW-5822
> URL: https://issues.apache.org/jira/browse/ARROW-5822
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Priority: Minor
>  Labels: beginner
>
> The flight package provides IntegrationTestClient and IntegrationTestServer 
> as sample implementations for client/server side. 
> In these implementations, the client sends the content of some json file to 
> the server. However, it is not clear what the format of the json file should 
> be like.
> So it is desirable to also provide a sample json file, which makes it easier 
> to run the flight program. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5722) [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5722:

Labels: beginner  (was: )

> [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray
> ---
>
> Key: ARROW-5722
> URL: https://issues.apache.org/jira/browse/ARROW-5722
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Priority: Major
>  Labels: beginner
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5912) [Python] conversion from datetime objects with mixed timezones should normalize to UTC

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5912:

Labels: beginner  (was: )

> [Python] conversion from datetime objects with mixed timezones should 
> normalize to UTC
> --
>
> Key: ARROW-5912
> URL: https://issues.apache.org/jira/browse/ARROW-5912
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Currently, when having objects with mixed timezones, they are each separately 
> interpreted as their local time:
> {code:python}
> >>> ts_pd_paris = pd.Timestamp("1970-01-01 01:00", tz="Europe/Paris")
> >>> ts_pd_paris
> Timestamp('1970-01-01 01:00:00+0100', tz='Europe/Paris')
> >>> ts_pd_helsinki = pd.Timestamp("1970-01-01 02:00", tz="Europe/Helsinki")
> >>> ts_pd_helsinki
> Timestamp('1970-01-01 02:00:00+0200', tz='Europe/Helsinki')
> >>> a = pa.array([ts_pd_paris, ts_pd_helsinki])   
> >>>   
> >>>  
> >>> a
> 
> [
>   1970-01-01 01:00:00.00,
>   1970-01-01 02:00:00.00
> ]
> >>> a.type
> TimestampType(timestamp[us])
> {code}
> So both times are actually about the same moment in time (the same value in 
> UTC; in pandas their stored {{value}} is also the same), but once converted 
> to pyarrow, they are both tz-naive but no longer the same time. That seems 
> rather unexpected and a source for bugs.
> I think a better option would be to normalize to UTC, and result in a 
> tz-aware TimestampArray with UTC as timezone. 
> That is also the behaviour of pandas if you force the conversion to result in 
> datetimes (by default pandas will keep them as object array preserving the 
> different timezones).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-2619) [Rust] Move JSON serde code to separate file/module

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-2619:

Labels: beginner  (was: )

> [Rust] Move JSON serde code to separate file/module
> ---
>
> Key: ARROW-2619
> URL: https://issues.apache.org/jira/browse/ARROW-2619
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Minor
>  Labels: beginner
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-1984) [Java] NullableDateMilliVector.getObject() should return a LocalDate, not a LocalDateTime

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-1984:

Labels: beginner  (was: )

> [Java] NullableDateMilliVector.getObject() should return a LocalDate, not a 
> LocalDateTime
> -
>
> Key: ARROW-1984
> URL: https://issues.apache.org/jira/browse/ARROW-1984
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Vanco Buca
>Priority: Minor
>  Labels: beginner
>
> NullableDateMilliVector.getObject() today returns a LocalDateTime. However, 
> this vector is used to store date information, and thus, getObject() should 
> return a LocalDate. 
> Please note: there already exists a vector that returns LocalDateTime --
>  the NullableTimestampMilliVector.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-3552) [Python] Implement pa.RecordBatch.serialize_to to write single message to an OutputStream

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-3552:

Labels: beginner  (was: )

> [Python] Implement pa.RecordBatch.serialize_to to write single message to an 
> OutputStream
> -
>
> Key: ARROW-3552
> URL: https://issues.apache.org/jira/browse/ARROW-3552
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
>
> {{RecordBatch.serialize}} writes in memory. This would help with shared 
> memory worksflows. See also pyarrow.ipc.write_tensor



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5374) [Python] pa.read_record_batch() doesn't work

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5374:

Labels: begin  (was: )

> [Python] pa.read_record_batch() doesn't work
> 
>
> Key: ARROW-5374
> URL: https://issues.apache.org/jira/browse/ARROW-5374
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: begin
>
> {code:python}
> >>> batch = pa.RecordBatch.from_arrays([pa.array([b"foo"], type=pa.utf8())], 
> >>> names=['strs'])   
> >>> 
> >>> stream = pa.BufferOutputStream()
> >>> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
> >>> writer.write_batch(batch) 
> >>>   
> >>>
> >>> writer.close()
> >>>   
> >>>
> >>> buf = stream.getvalue()   
> >>>   
> >>>
> >>> pa.read_record_batch(buf, batch.schema)   
> >>>   
> >>>
> Traceback (most recent call last):
>   File "", line 1, in 
> pa.read_record_batch(buf, batch.schema)
>   File "pyarrow/ipc.pxi", line 583, in pyarrow.lib.read_record_batch
> check_status(ReadRecordBatch(deref(message.message.get()),
>   File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status
> raise ArrowIOError(message)
> ArrowIOError: Expected IPC message of type schema got record batch
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5374) [Python] pa.read_record_batch() doesn't work

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5374:

Labels: beginner  (was: begin)

> [Python] pa.read_record_batch() doesn't work
> 
>
> Key: ARROW-5374
> URL: https://issues.apache.org/jira/browse/ARROW-5374
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: beginner
>
> {code:python}
> >>> batch = pa.RecordBatch.from_arrays([pa.array([b"foo"], type=pa.utf8())], 
> >>> names=['strs'])   
> >>> 
> >>> stream = pa.BufferOutputStream()
> >>> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
> >>> writer.write_batch(batch) 
> >>>   
> >>>
> >>> writer.close()
> >>>   
> >>>
> >>> buf = stream.getvalue()   
> >>>   
> >>>
> >>> pa.read_record_batch(buf, batch.schema)   
> >>>   
> >>>
> Traceback (most recent call last):
>   File "", line 1, in 
> pa.read_record_batch(buf, batch.schema)
>   File "pyarrow/ipc.pxi", line 583, in pyarrow.lib.read_record_batch
> check_status(ReadRecordBatch(deref(message.message.get()),
>   File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status
> raise ArrowIOError(message)
> ArrowIOError: Expected IPC message of type schema got record batch
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-4176) [C++/Python] Human readable arrow schema comparison

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-4176:

Labels: beginner  (was: )

> [C++/Python] Human readable arrow schema comparison
> ---
>
> Key: ARROW-4176
> URL: https://issues.apache.org/jira/browse/ARROW-4176
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Florian Jetter
>Priority: Minor
>  Labels: beginner
>
> When working with arrow schemas it would be helpful to have a human readable 
> representation of the diff between two schemas.
> This could be either exposed as a function returning a string/diff object or 
> via a function raising an Exception with this information.
> For instance:
> {code}
> schema_diff = get_schema_diff(schema1, schema2)
> expected_diff = """
> - col_changed: int8
> + col_changed: double
> + col_additional: int8
> """
> assert schema_diff == expected_diff
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-3776) [Rust] Mark methods that do not perform bounds checking as unsafe

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-3776:

Labels: beginner  (was: )

> [Rust] Mark methods that do not perform bounds checking as unsafe
> -
>
> Key: ARROW-3776
> URL: https://issues.apache.org/jira/browse/ARROW-3776
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Minor
>  Labels: beginner
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5248) [Python] support dateutil timezones

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5248:

Labels: beginner  (was: )

> [Python] support dateutil timezones
> ---
>
> Key: ARROW-5248
> URL: https://issues.apache.org/jira/browse/ARROW-5248
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Minor
>  Labels: beginner
>
> The {{dateutil}} packages also provides a set of timezone objects 
> (https://dateutil.readthedocs.io/en/stable/tz.html) in addition to {{pytz}}. 
> In pyarrow, we only support pytz timezones (and the stdlib datetime.timezone 
> fixed offset):
> {code}
> In [2]: import dateutil.tz
>   
>   
> In [3]: import pyarrow as pa  
>   
>   
> In [5]: pa.timestamp('us', dateutil.tz.gettz('Europe/Brussels'))  
>   
>   
> ...
> ~/miniconda3/envs/dev37/lib/python3.7/site-packages/pyarrow/types.pxi in 
> pyarrow.lib.tzinfo_to_string()
> ValueError: Unable to convert timezone 
> `tzfile('/usr/share/zoneinfo/Europe/Brussels')` to string
> {code}
> But pandas also supports dateutil timezones. As a consequence, when having a 
> pandas DataFrame that uses a dateutil timezone, you get an error when 
> converting to an arrow table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-4111) [Python] Create time types from Python sequences of integers

2019-08-15 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-4111:

Labels: beginner  (was: )

> [Python] Create time types from Python sequences of integers
> 
>
> Key: ARROW-4111
> URL: https://issues.apache.org/jira/browse/ARROW-4111
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> This works for dates, but not times:
> {code}
> > traceback 
> > >
> def test_to_pandas_deduplicate_date_time():
> nunique = 100
> repeats = 10
> 
> unique_values = list(range(nunique))
> 
> cases = [
> # array type, to_pandas options
> ('date32', {'date_as_object': True}),
> ('date64', {'date_as_object': True}),
> ('time32[ms]', {}),
> ('time64[us]', {})
> ]
> 
> for array_type, pandas_options in cases:
> >   arr = pa.array(unique_values * repeats, type=array_type)
> pyarrow/tests/test_convert_pandas.py:2392: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> pyarrow/array.pxi:175: in pyarrow.lib.array
> return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
> pyarrow/array.pxi:36: in pyarrow.lib._sequence_to_array
> check_status(ConvertPySequence(sequence, mask, options, ))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >   raise ArrowInvalid(message)
> E   pyarrow.lib.ArrowInvalid: ../src/arrow/python/python_to_arrow.cc:1012 : 
> ../src/arrow/python/iterators.h:70 : Could not convert 0 with type int: 
> converting to time32
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5830) [C++] Stop using memcmp in TensorEquals

2019-08-21 Thread lidavidm (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5830:

Labels: beginner  (was: )

> [C++] Stop using memcmp in TensorEquals
> ---
>
> Key: ARROW-5830
> URL: https://issues.apache.org/jira/browse/ARROW-5830
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Priority: Major
>  Labels: beginner
>
> Because memcmp problematic for comparing floating-point values, such as NaNs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-2719) [Python/C++] ArrowSchema not hashable

2019-08-21 Thread lidavidm (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-2719:

Labels: beginner  (was: )

> [Python/C++] ArrowSchema not hashable
> -
>
> Key: ARROW-2719
> URL: https://issues.apache.org/jira/browse/ARROW-2719
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Florian Jetter
>Priority: Minor
>  Labels: beginner
>
> The arrow schema is immutable and should provide a way of hashing itself. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-2652) [C++/Python] Document how to provide information on segfaults

2019-08-21 Thread lidavidm (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-2652:

Labels: beginner  (was: )

> [C++/Python] Document how to provide information on segfaults
> -
>
> Key: ARROW-2652
> URL: https://issues.apache.org/jira/browse/ARROW-2652
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
>
> We often have users that report segmentation faults in {{pyarrow}}. This will 
> sadly keep reappearing as we also don't have the magical ability of writing 
> 100%-bug-free code. Thus we should have a small section in our documentation 
> on how people can give us the relevant information in the case of a 
> segmentation fault. Preferably the documentation covers {{gdb}} and {{lldb}}. 
> They both have similar commands but differ in some minor flags.
> For one of the example comments I gave to a user in tickets see 
> https://github.com/apache/arrow/issues/2089#issuecomment-393477116



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-2665) [Python/C++] Add index() method to find first occurence of Python scalar

2019-08-21 Thread lidavidm (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-2665:

Labels: Analytics beginner  (was: Analytics)

> [Python/C++] Add index() method to find first occurence of Python scalar
> 
>
> Key: ARROW-2665
> URL: https://issues.apache.org/jira/browse/ARROW-2665
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: Analytics, beginner
>
> Python lists have an {{index(x, start, end)}} method to find the first 
> occurence of an element. We should add a method with the same interface 
> supporting Python scalars on the typical triplet 
> {{Array/ChunkedArray/Columns}}.
> See also 
> https://docs.python.org/3.6/tutorial/datastructures.html#more-on-lists



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-2857) [Python] Expose integration test JSON read/write in Python API

2019-08-21 Thread lidavidm (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-2857:

Labels: beginner  (was: )

> [Python] Expose integration test JSON read/write in Python API
> --
>
> Key: ARROW-2857
> URL: https://issues.apache.org/jira/browse/ARROW-2857
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> This should be clearly marked to not be used for persistence



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6426) [FlightRPC] Expose gRPC configuration knobs in Flight

2019-09-03 Thread lidavidm (Jira)
lidavidm created ARROW-6426:
---

 Summary: [FlightRPC] Expose gRPC configuration knobs in Flight
 Key: ARROW-6426
 URL: https://issues.apache.org/jira/browse/ARROW-6426
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Affects Versions: 0.14.1
Reporter: lidavidm
Assignee: lidavidm


We should not expose gRPC symbols/APIs publicly, but should still provide a way 
to configure gRPC options as they may be needed in deployments (for instance, 
we ran into an issue with gRPC keepalives). In Java, this is fortunately 
solvable with reflection, but this is impossible in C++/Python.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6412) [C++] arrow-flight-test can crash because of port allocation

2019-09-03 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921408#comment-16921408
 ] 

lidavidm commented on ARROW-6412:
-

Another way to solve this is to allow binding to port 0, then adding a method 
to get the actual port. gRPC supports this, and we've already separated Init 
and Serve in FlightServerBase.

> [C++] arrow-flight-test can crash because of port allocation
> 
>
> Key: ARROW-6412
> URL: https://issues.apache.org/jira/browse/ARROW-6412
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I get this error sometimes locally when running the tests in parallel:
> {code}
> [--] 11 tests from TestFlightClient
> [ RUN  ] TestFlightClient.ListFlights
> E0902 15:13:55.996271678   17281 socket_utils_common_posix.cc:201] check for 
> SO_REUSEPORT: {"created":"@1567430035.996256600","description":"SO_REUSEPORT 
> unavailable on compiling 
> system","file":"../src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":169}
> [   OK ] TestFlightClient.ListFlights (17 ms)
> [ RUN  ] TestFlightClient.GetFlightInfo
> E0902 15:13:56.013065793   17281 server_chttp2.cc:40]
> {"created":"@1567430036.013032600","description":"No address added out of 
> total 1 
> resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567430036.013029044","description":"Unable
>  to configure 
> socket","fd":6,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567430036.013021880","description":"Address
>  already in 
> use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address
>  already in use","syscall":"bind"}]}]}
> ../src/arrow/flight/flight_test.cc:271: Failure
> Failed
> 'server->Init(options)' failed with Unknown error: Server did not start 
> properly
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

2019-09-05 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923418#comment-16923418
 ] 

lidavidm commented on ARROW-2428:
-

Hi Joris, overall I agree with the approach here. It's a little unfortunate 
that Pandas doesn't have a general column/table metadata mechanism...

I agree that we want both a default hook for ExtensionType->Pandas conversions, 
and a way to override conversions on an individual basis. I think adding a new 
argument to {{to_pandas}} is easier than maintaining yet another function 
registry. Similarly, adding a conversion method on {{ExtensionType}} (or maybe 
that should be a future {{ExtensionArray}} class?) would be preferable to 
maintaining a registry. 

If we have something like {{pa.ExtensionType.\_\_pandas_array\_\_}}, should we 
also have {{pa.ExtensionType.\_\_pandas_dtype\_\_}}?

> [Python] Add API to map Arrow types (including extension types) to pandas 
> ExtensionArray instances for to_pandas conversions
> 
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

2019-09-06 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924221#comment-16924221
 ] 

lidavidm commented on ARROW-2428:
-

It sounds like a new registry isn't needed, but adding parameters to to_pandas 
would be useful for customizing conversions of built-in types; Joris notes 
Fletcher would want to use that.

> [Python] Add API to map Arrow types (including extension types) to pandas 
> ExtensionArray instances for to_pandas conversions
> 
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6528) [C++] Spurious Flight test failures (port allocation failure)

2019-09-11 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927569#comment-16927569
 ] 

lidavidm commented on ARROW-6528:
-

Oh, I see you just merged it.

> [C++] Spurious Flight test failures (port allocation failure)
> -
>
> Key: ARROW-6528
> URL: https://issues.apache.org/jira/browse/ARROW-6528
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> Seems like our port allocation scheme inside unit tests is still not very 
> reliable :-/
> https://ci.ursalabs.org/#/builders/71/builds/4147/steps/8/logs/stdio
> {code}
> [--] 3 tests from TestMetadata
> [ RUN  ] TestMetadata.DoGet
> E0905 12:45:40.322644527   10203 server_chttp2.cc:40]
> {"created":"@1567687540.322612245","description":"No address added out of 
> total 1 
> resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567687540.322609844","description":"Unable
>  to configure 
> socket","fd":7,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567687540.322602634","description":"Address
>  already in 
> use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address
>  already in use","syscall":"bind"}]}]}
> ../src/arrow/flight/flight_test.cc:429: Failure
> Failed
> 'server->Init(options)' failed with Unknown error: Server did not start 
> properly
> /buildbot/AMD64_Conda_Python_3_7/cpp/build-support/run-test.sh: line 97: 
> 10203 Segmentation fault  (core dumped) $TEST_EXECUTABLE "$@" 2>&1
>  10204 Done| $ROOT/build-support/asan_symbolize.py
>  10205 Done| ${CXXFILT:-c++filt}
>  10206 Done| 
> $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
>  10207 Done| $pipe_cmd 2>&1
>  10208 Done| tee $LOGFILE
> /buildbot/AMD64_Conda_Python_3_7/cpp/build/src/arrow/flight
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6528) [C++] Spurious Flight test failures (port allocation failure)

2019-09-11 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927567#comment-16927567
 ] 

lidavidm commented on ARROW-6528:
-

[~pitrou] as part of ARROW-6426 
([PR|https://github.com/apache/arrow/pull/5292]) I added 
FlightServerBase#port() getter in C++, so we could instead bind to port 0 
instead of racing to find a free port. Want me to pull that out separately?

> [C++] Spurious Flight test failures (port allocation failure)
> -
>
> Key: ARROW-6528
> URL: https://issues.apache.org/jira/browse/ARROW-6528
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> Seems like our port allocation scheme inside unit tests is still not very 
> reliable :-/
> https://ci.ursalabs.org/#/builders/71/builds/4147/steps/8/logs/stdio
> {code}
> [--] 3 tests from TestMetadata
> [ RUN  ] TestMetadata.DoGet
> E0905 12:45:40.322644527   10203 server_chttp2.cc:40]
> {"created":"@1567687540.322612245","description":"No address added out of 
> total 1 
> resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567687540.322609844","description":"Unable
>  to configure 
> socket","fd":7,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567687540.322602634","description":"Address
>  already in 
> use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address
>  already in use","syscall":"bind"}]}]}
> ../src/arrow/flight/flight_test.cc:429: Failure
> Failed
> 'server->Init(options)' failed with Unknown error: Server did not start 
> properly
> /buildbot/AMD64_Conda_Python_3_7/cpp/build-support/run-test.sh: line 97: 
> 10203 Segmentation fault  (core dumped) $TEST_EXECUTABLE "$@" 2>&1
>  10204 Done| $ROOT/build-support/asan_symbolize.py
>  10205 Done| ${CXXFILT:-c++filt}
>  10206 Done| 
> $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
>  10207 Done| $pipe_cmd 2>&1
>  10208 Done| tee $LOGFILE
> /buildbot/AMD64_Conda_Python_3_7/cpp/build/src/arrow/flight
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6062) [FlightRPC] Allow timeouts on all stream reads

2019-09-18 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932370#comment-16932370
 ] 

lidavidm commented on ARROW-6062:
-

I would actually like this to be a timeout per read operation, but this isn't 
possible unless we implement async APIs (gRPC generally only offers per-call 
timeouts which we already have). In a long streaming operation, you may not 
have a bound on how long the entire read will take, but you do have a bound on 
how long an individual operation will take.

> [FlightRPC] Allow timeouts on all stream reads
> --
>
> Key: ARROW-6062
> URL: https://issues.apache.org/jira/browse/ARROW-6062
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: lidavidm
>Priority: Major
> Fix For: 1.0.0
>
>
> Anywhere where we offer reading from a stream in Flight, we need to offer a 
> timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5722) [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray

2019-09-06 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924290#comment-16924290
 ] 

lidavidm commented on ARROW-5722:
-

[~csun], I have some basic implementations. Printing nested arrays is 
difficult; I've punted on that for StructArray/ListArray. Really, we need Array 
to have a Debug trait bound as well - is that acceptable?

In the future, we may also want a pretty-printer API to make nested arrays look 
better (with indentation, etc).

> [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray
> ---
>
> Key: ARROW-5722
> URL: https://issues.apache.org/jira/browse/ARROW-5722
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Priority: Major
>  Labels: beginner
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-4914) [Rust] Array slice returns incorrect bitmask

2019-09-06 Thread lidavidm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924348#comment-16924348
 ] 

lidavidm commented on ARROW-4914:
-

It looks like this was fixed as part of ARROW-4853, can it be closed?

> [Rust] Array slice returns incorrect bitmask
> 
>
> Key: ARROW-4914
> URL: https://issues.apache.org/jira/browse/ARROW-4914
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Neville Dipale
>Priority: Blocker
>  Labels: beginner
>
> Slicing arrays changes the offset, length and null count of their array data, 
> but the bitmask is not changed.
> This results in the correct null count, but the array values might be marked 
> incorrectly as valid/invalid based on the old bitmask positions before the 
> offset.
> To reproduce, create an array with some null values, slice the array, and 
> then dbg!() it (after downcasting).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6074) [FlightRPC] Implement middleware

2019-07-30 Thread lidavidm (JIRA)
lidavidm created ARROW-6074:
---

 Summary: [FlightRPC] Implement middleware
 Key: ARROW-6074
 URL: https://issues.apache.org/jira/browse/ARROW-6074
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: lidavidm






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6075) [FlightRPC] Handle uncaught exceptions in middleware

2019-07-30 Thread lidavidm (JIRA)
lidavidm created ARROW-6075:
---

 Summary: [FlightRPC] Handle uncaught exceptions in middleware
 Key: ARROW-6075
 URL: https://issues.apache.org/jira/browse/ARROW-6075
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: lidavidm






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6165) [Integration] Use multiprocessing to run integration tests on multiple CPU cores

2019-08-07 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902399#comment-16902399
 ] 

lidavidm commented on ARROW-6165:
-

We'll also have to find free ports for the Flight tests, as right now they 
assume a hardcoded port. (Not hard to do, fortunately.)

> [Integration] Use multiprocessing to run integration tests on multiple CPU 
> cores
> 
>
> Key: ARROW-6165
> URL: https://issues.apache.org/jira/browse/ARROW-6165
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Integration
>Reporter: Wes McKinney
>Priority: Major
>
> The stdout/stderr will have to be captured appropriate so that the console 
> output when run in parallel is still readable



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6062) [FlightRPC] Allow timeouts on all stream reads

2019-07-29 Thread lidavidm (JIRA)
lidavidm created ARROW-6062:
---

 Summary: [FlightRPC] Allow timeouts on all stream reads
 Key: ARROW-6062
 URL: https://issues.apache.org/jira/browse/ARROW-6062
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: lidavidm


Anywhere where we offer reading from a stream in Flight, we need to offer a 
timeout.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6064) [FlightRPC] [C++] Clean up IWYU

2019-07-29 Thread lidavidm (JIRA)
lidavidm created ARROW-6064:
---

 Summary: [FlightRPC] [C++] Clean up IWYU
 Key: ARROW-6064
 URL: https://issues.apache.org/jira/browse/ARROW-6064
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: lidavidm


As reported by Wes https://gist.github.com/wesm/af59c7cc8f35c6fd806b0d041b816da8



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6063) [FlightRPC] Implement "half-closed" semantics for DoPut

2019-07-29 Thread lidavidm (JIRA)
lidavidm created ARROW-6063:
---

 Summary: [FlightRPC] Implement "half-closed" semantics for DoPut
 Key: ARROW-6063
 URL: https://issues.apache.org/jira/browse/ARROW-6063
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: lidavidm


Both sides on a DoPut should be able to half-close the stream, indicating they 
will no longer write. This allows a client to indicate that it is done writing 
data to the server, while still leaving the stream open so it can read metadata 
responses until the server finishes. Meanwhile, the server would see that the 
client has finished and be able to stop blocking on reading client messages.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6136) [FlightRPC][Java] Don't double-close response stream

2019-08-05 Thread lidavidm (JIRA)
lidavidm created ARROW-6136:
---

 Summary: [FlightRPC][Java] Don't double-close response stream
 Key: ARROW-6136
 URL: https://issues.apache.org/jira/browse/ARROW-6136
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.14.1
Reporter: lidavidm
Assignee: lidavidm
 Fix For: 0.15.0


DoPut in Java double-closes the metadata response stream: if the service 
implementation sends an error down that channel, the Flight implementation will 
unconditionally try to complete the stream, violating the gRPC semantics 
(either an error or a completion may be sent, never both).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5971) [Website] Blog post introducing Arrow Flight

2019-07-17 Thread lidavidm (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887246#comment-16887246
 ] 

lidavidm commented on ARROW-5971:
-

I'd be happy to look over anything. We're also working on a post of our own, 
though that probably won't come in the near future.

It might be interesting to show Python numbers as well - it actually performs 
better than Java in our tests (don't think I can share actual data though).

> [Website] Blog post introducing Arrow Flight
> 
>
> Key: ARROW-5971
> URL: https://issues.apache.org/jira/browse/ARROW-5971
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> I think it's a good time to be bringing more attention to our work over the 
> last 12-14 months on Arrow Flight. 
> I would be OK to draft an initial version of the blog post, and I can 
> circulate to others for review / edit / comment. If there are particular 
> benchmarks you would like to see included, contributing code for that would 
> also be helpful. My plan would be to show tcp throughput on localhost, and 
> node-to-node throughput on a local gigabit ethernet network. I think the 
> localhost throughput is important to show that Flight is a tool that you 
> would want to reach for for faster throughput in high performance networking 
> (e.g. 10/40 gigabit)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5978) [FlightRPC] [Java] Integration test client doesn't close buffers

2019-07-18 Thread lidavidm (JIRA)
lidavidm created ARROW-5978:
---

 Summary: [FlightRPC] [Java] Integration test client doesn't close 
buffers
 Key: ARROW-5978
 URL: https://issues.apache.org/jira/browse/ARROW-5978
 Project: Apache Arrow
  Issue Type: Test
  Components: FlightRPC, Integration, Java
Affects Versions: 0.14.0
Reporter: lidavidm
Assignee: lidavidm
 Fix For: 1.0.0


The integration test client doesn't close any of the clients or free any of the 
buffers it creates.

Trying to do so leads to a leak problem on the dictionary vector case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5979) [FlightRPC] Expose (de)serialization of protocol types

2019-07-18 Thread lidavidm (JIRA)
lidavidm created ARROW-5979:
---

 Summary: [FlightRPC] Expose (de)serialization of protocol types
 Key: ARROW-5979
 URL: https://issues.apache.org/jira/browse/ARROW-5979
 Project: Apache Arrow
  Issue Type: New Feature
  Components: FlightRPC
Reporter: lidavidm


It would be nice to be able to serialize/deserialize Flight types (e.g. 
FlightInfo) to/from the binary representations, in order to interoperate with 
systems that might want to provide (say) Flight tickets or FlightInfo without 
using the Flight protocol. For instance, you might have a search server that 
exposes a REST interface and wants to provide FlightInfo objects for Flight 
clients, without having to listen on a separate port.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (ARROW-5979) [FlightRPC] Expose (de)serialization of protocol types

2019-07-19 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm reassigned ARROW-5979:
---

Assignee: lidavidm

> [FlightRPC] Expose (de)serialization of protocol types
> --
>
> Key: ARROW-5979
> URL: https://issues.apache.org/jira/browse/ARROW-5979
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: FlightRPC
>Reporter: lidavidm
>Assignee: lidavidm
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be nice to be able to serialize/deserialize Flight types (e.g. 
> FlightInfo) to/from the binary representations, in order to interoperate with 
> systems that might want to provide (say) Flight tickets or FlightInfo without 
> using the Flight protocol. For instance, you might have a search server that 
> exposes a REST interface and wants to provide FlightInfo objects for Flight 
> clients, without having to listen on a separate port.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6017) [FlightRPC] Allow creating Locations with unknown schemes

2019-07-23 Thread lidavidm (JIRA)
lidavidm created ARROW-6017:
---

 Summary: [FlightRPC] Allow creating Locations with unknown schemes
 Key: ARROW-6017
 URL: https://issues.apache.org/jira/browse/ARROW-6017
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Affects Versions: 0.14.0
Reporter: lidavidm
Assignee: lidavidm


Right now Flight clients error if the server hands them a Location with an 
unknown scheme. Also, you can't construct locations with non-gRPC schemes. 
Since Flight will want to support other transports, we should allow unknown 
schemes up until a client is constructed for them. This would also make it 
possible for a Flight service to reference non-Flight services in FlightInfo.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5996) [Java] Avoid resource leak in flight service

2019-07-22 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5996:

Component/s: FlightRPC

> [Java] Avoid resource leak in flight service
> 
>
> Key: ARROW-5996
> URL: https://issues.apache.org/jira/browse/ARROW-5996
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> # In FlightService#doPutCustom, the flight stream must be closed, even if an 
> exception is thrown during the call of responseObserver.onError
>  # The exception occurred during the call to acceptPut should not be 
> swallowed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (ARROW-5877) [FlightRPC] Document caveats around usage of auth APIs

2019-07-09 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm reassigned ARROW-5877:
---

Assignee: lidavidm

> [FlightRPC] Document caveats around usage of auth APIs
> --
>
> Key: ARROW-5877
> URL: https://issues.apache.org/jira/browse/ARROW-5877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: lidavidm
>Assignee: lidavidm
>Priority: Major
>
> The Flight Handshake method can be insecure, and currently has a surprising 
> failure mode; we should document these caveats (blocks forever waiting on 
> client/server; insecure depending on deployment configuration)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5877) [FlightRPC] Fix auth incompatibilities between Python/Java

2019-07-09 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5877:

Description: 
It turns out the blocking-forever issue was a combination of problems in Python 
and Java. We should simply fix the issues.

---

The Flight Handshake method can be insecure, and currently has a surprising 
failure mode; we should document these caveats (blocks forever waiting on 
client/server; insecure depending on deployment configuration)

  was:The Flight Handshake method can be insecure, and currently has a 
surprising failure mode; we should document these caveats (blocks forever 
waiting on client/server; insecure depending on deployment configuration)

Summary: [FlightRPC] Fix auth incompatibilities between Python/Java  
(was: [FlightRPC] Document caveats around usage of auth APIs)

> [FlightRPC] Fix auth incompatibilities between Python/Java
> --
>
> Key: ARROW-5877
> URL: https://issues.apache.org/jira/browse/ARROW-5877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: lidavidm
>Assignee: lidavidm
>Priority: Major
>
> It turns out the blocking-forever issue was a combination of problems in 
> Python and Java. We should simply fix the issues.
> ---
> The Flight Handshake method can be insecure, and currently has a surprising 
> failure mode; we should document these caveats (blocks forever waiting on 
> client/server; insecure depending on deployment configuration)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5681) [FlightRPC] Wrap gRPC exceptions/statuses

2019-07-09 Thread lidavidm (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5681:

Component/s: C++
Summary: [FlightRPC] Wrap gRPC exceptions/statuses  (was: 
[FlightRPC][Java] Wrap gRPC exceptions)

> [FlightRPC] Wrap gRPC exceptions/statuses
> -
>
> Key: ARROW-5681
> URL: https://issues.apache.org/jira/browse/ARROW-5681
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC, Java
>Reporter: lidavidm
>Assignee: lidavidm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Instead of requiring users to catch/throw StatusRuntimeException in Flight 
> services/clients, and thereby leaking gRPC details, we should provide our own 
> set of exceptions and status codes. This way, services can provide proper 
> error messages and error codes to clients, which can catch the exception and 
> respond properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-6677) [FlightRPC][C++] Document using Flight in C++

2019-09-24 Thread lidavidm (Jira)
lidavidm created ARROW-6677:
---

 Summary: [FlightRPC][C++] Document using Flight in C++
 Key: ARROW-6677
 URL: https://issues.apache.org/jira/browse/ARROW-6677
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation, FlightRPC
Reporter: lidavidm
Assignee: lidavidm
 Fix For: 1.0.0


Similarly to ARROW-6390 for Python, we should have C++ documentation for Flight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)