[jira] [Commented] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector
[ https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869421#comment-16869421 ] lidavidm commented on ARROW-5658: - [~liaotian1005], generally UNKNOWN means that there was an uncaught server-side exception. Do you see any traceback in the server output? If not, then we need to log these things inside Flight. > [JAVA] apache arrow-flight cannot send listvector > -- > > Key: ARROW-5658 > URL: https://issues.apache.org/jira/browse/ARROW-5658 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.13.0 > Environment: java8 arrow-java 0.13.0 >Reporter: luckily >Priority: Major > Attachments: ClientStart.java, ServerStart.java, pom.xml > > > I can't transfer using apache arrow-flihgt. Contains listvector data. The > problem description is as follows: > {quote} # I parse an xml file and convert it to an arrow format and finally > convert it to a parquet data format. The address of the .xml file data is url > [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)] > # I created a schema that uses listvector. > code show as below: > List list = > childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator)); > VectorSchemaRoot root = VectorSchemaRoot.of(inVector) > # Parse the xml file to get the list data in "cd". Use api use listvector. > `ListVector listVector = (ListVector) valueVectors; > List columns = column.getColumns(); > Column column1 = columns.get(0); > String name = column1.getId().toString(); > UnionListWriter writer = listVector.getWriter(); > Writer.allocate(); > For (int j = 0; j < column1.getColumns().size();j++) { > writer.setPosition(j); > writer.startList(); > Writer.list().startList(); > Column column2 = column1.getColumns().get(j); > List> lst = (List String>>) ((Map) val).get(name); > For (int k = 0; k < lst.size(); k++) { > Map stringStringMap = lst.get(k); > String value = > stringStringMap.get(column2.getId().toString()); > Switch (column2.getType()) { > Case FLOAT: > > Writer.list().float4().writeFloat4(stringConvertFloat(value)); > Break; > Case BOOLEAN: > > Writer.list().bit().writeBit(stringConvertBoolean(value)); > Break; > Case DECIMAL: > > Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale())); > Break; > Case TIMESTAMP: > > Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString())); > Break; > Case INTEGER: > Case BIGINT: > > Writer.list().bigInt().writeBigInt(stringConvertLong(value)); > Break; > Case VARCHAR: > VarCharHolder varBinaryHolder = new > VarCharHolder(); > varBinaryHolder.start = 0; > Byte[] bytes =value.getBytes(); > ArrowBuf buffer = > listVector.getAllocator().buffer(bytes.length); > varBinaryHolder.buffer = buffer; > buffer.writeBytes(bytes); > varBinaryHolder.end=bytes.length; > > Writer.list().varChar().write(varBinaryHolder); > Break; > Default: > Throw new IllegalArgumentException(" error no > type !!"); > } > } > Writer.list().endList(); > writer.endList(); > }` > 4. > After the write is complete, I will send to the arrow-flight server. server > code : > {quote} > {quote}@Override > public Callable acceptPut(FlightStream flightStream) { > return () -> { > try (VectorSchemaRoot root = flightStream.getRoot()) { > while (flightStream.next()) { > VectorSchemaRoot other = null; > try { > logger.info(" Receive message .. size: " + root.getRowCount()); > other =
[jira] [Created] (ARROW-5681) [FlightRPC][Java] Wrap gRPC exceptions
lidavidm created ARROW-5681: --- Summary: [FlightRPC][Java] Wrap gRPC exceptions Key: ARROW-5681 URL: https://issues.apache.org/jira/browse/ARROW-5681 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Reporter: lidavidm Assignee: lidavidm Fix For: 1.0.0 Instead of requiring users to catch/throw StatusRuntimeException in Flight services/clients, and thereby leaking gRPC details, we should provide our own set of exceptions and status codes. This way, services can provide proper error messages and error codes to clients, which can catch the exception and respond properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5063) [Java] FlightClient should not create a child allocator
[ https://issues.apache.org/jira/browse/ARROW-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869736#comment-16869736 ] lidavidm commented on ARROW-5063: - Since we couldn't reproduce the gRPC behavior, maybe I can rollback the change to the client and just keep the tests? > [Java] FlightClient should not create a child allocator > --- > > Key: ARROW-5063 > URL: https://issues.apache.org/jira/browse/ARROW-5063 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC, Java >Reporter: Bryan Cutler >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > I ran into a problem when testing out Flight using the ExampleFlightServer > with InMemoryStore producer. > A client will iterate over endpoints and locations to get the streams, and > the example creates a new client for each location. The only way to close the > allocator in the FlightClient is to close the FlightClient, which also closes > the read channel. If the location is the same for each FlightStream (as is > the case for the InMemoryStore), then it seems like grpc will reuse the > channel, so closing one read client will shutdown the channel and the > remaining FlightStreams cannot be read. > If an allocator was created by the owner of the FlightClient, then the client > would not need to close it and this problem would be avoided. I believe other > Flight classes do not create child allocators either, so this change would be > consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5643) [Flight] Add ability to override hostname checking
[ https://issues.apache.org/jira/browse/ARROW-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5643: Fix Version/s: (was: 0.14.0) 1.0.0 > [Flight] Add ability to override hostname checking > -- > > Key: ARROW-5643 > URL: https://issues.apache.org/jira/browse/ARROW-5643 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC >Reporter: lidavidm >Assignee: lidavidm >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > We should add the ability to override hostname checks, so you can connect to > localhost over TLS but still verify that the certificate is for some other > domain. > Example: when deploying on Kubernetes with headless services, clients connect > directly to backend services and do load balancing themselves. Thus all > instances of an application must present a certificate for the same hostname. > To do health checks in such an environment, you can't connect to the TLS > hostname (which may resolve to a different instance); you need to connect to > localhost, and override the hostname check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket
[ https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm closed ARROW-5829. --- Resolution: Duplicate Assignee: lidavidm > [Java] failure in TestServerOptions.domainSocket > > > Key: ARROW-5829 > URL: https://issues.apache.org/jira/browse/ARROW-5829 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Pindikura Ravindra >Assignee: lidavidm >Priority: Major > > I see this consistently with the 0.14.0 rc0 release candidate on mac mojave. > java.io.IOException: Failed to bind > at > org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46) > Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: > Address already in use > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket
[ https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877880#comment-16877880 ] lidavidm edited comment on ARROW-5829 at 7/3/19 2:21 PM: - I think this is the same underlying cause - on OSX, the release verification script uses a custom temp dir that makes the domain socket path too long. I'm going to close this in favor of ARROW-5836 to keep things in one place. was (Author: lidavidm): I think this is the same underlying cause - on OSX, the release verification script uses a custom temp dir that makes the domain socket path too long. > [Java] failure in TestServerOptions.domainSocket > > > Key: ARROW-5829 > URL: https://issues.apache.org/jira/browse/ARROW-5829 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Pindikura Ravindra >Priority: Major > > I see this consistently with the 0.14.0 rc0 release candidate on mac mojave. > java.io.IOException: Failed to bind > at > org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46) > Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: > Address already in use > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use
[ https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877875#comment-16877875 ] lidavidm commented on ARROW-5836: - This is because the path for the domain socket is too long on OSX, which happens because the release verification script generates a custom TMPDIR. For instance, this is the path that it tries to use for me, and if I try to listen on a domain socket with netcat, I get the following: {{$ nc -l -U /private/var/folders/tm/b4drxjmn7j79gtp0ppbw5qlhgn/T/arrow-0.14.0.X.qLFPG9EZ/apache-arrow-0.14.0/java/flight/target/flight-unit-test-8900940943285883708.sock}} {{nc: File name too long}} Not sure what the best way to fix this is - perhaps hardcode /tmp inside the test? I quickly scanned the grpc and grpc-java repositories, but they don't seem to test domain sockets (beyond maybe parsing the address). > [Java][OSX] Flight tests are failing: address already in use > > > Key: ARROW-5836 > URL: https://issues.apache.org/jira/browse/ARROW-5836 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Krisztian Szucs >Priority: Major > > {code} > Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) > at > io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) > at > io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA
[jira] [Assigned] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use
[ https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm reassigned ARROW-5836: --- Assignee: lidavidm > [Java][OSX] Flight tests are failing: address already in use > > > Key: ARROW-5836 > URL: https://issues.apache.org/jira/browse/ARROW-5836 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Krisztian Szucs >Assignee: lidavidm >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) > at > io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) > at > io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48) > at >
[jira] [Commented] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use
[ https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877877#comment-16877877 ] lidavidm commented on ARROW-5836: - Ah wait, I did find their test case - they hardcode /tmp: https://github.com/grpc/grpc/blob/df998f70239ec80af7d9af7133f9c0757e952f39/test/core/end2end/fixtures/h2_local_uds.cc#L29-L38 > [Java][OSX] Flight tests are failing: address already in use > > > Key: ARROW-5836 > URL: https://issues.apache.org/jira/browse/ARROW-5836 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Krisztian Szucs >Priority: Major > > {code} > Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) > at > io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) > at > io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183) > at >
[jira] [Comment Edited] (ARROW-5836) [Java][OSX] Flight tests are failing: address already in use
[ https://issues.apache.org/jira/browse/ARROW-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877877#comment-16877877 ] lidavidm edited comment on ARROW-5836 at 7/3/19 2:18 PM: - Ah wait, I did find their test case - they hardcode /tmp: [https://github.com/grpc/grpc/blob/df998f70239ec80af7d9af7133f9c0757e952f39/test/core/end2end/fixtures/h2_local_uds.cc#L29-L38] grpc-java doesn't seem to test this. was (Author: lidavidm): Ah wait, I did find their test case - they hardcode /tmp: https://github.com/grpc/grpc/blob/df998f70239ec80af7d9af7133f9c0757e952f39/test/core/end2end/fixtures/h2_local_uds.cc#L29-L38 > [Java][OSX] Flight tests are failing: address already in use > > > Key: ARROW-5836 > URL: https://issues.apache.org/jira/browse/ARROW-5836 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Krisztian Szucs >Priority: Major > > {code} > Jul 03, 2019 3:09:45 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:531) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:183) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:421) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251) > at > io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) > at > io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) > at > io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) > at > io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) > at > io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) > at > io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Jul 03, 2019 3:09:46 PM io.grpc.netty.NettyServerHandler onStreamError > WARNING: Stream Error > io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA > frame for an unknown stream 3 > at > io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:129) > at >
[jira] [Commented] (ARROW-5769) [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh
[ https://issues.apache.org/jira/browse/ARROW-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874585#comment-16874585 ] lidavidm commented on ARROW-5769: - Actually looking at that again, it seems the environment variable is set and the path is right, but the file isn't there...are the submodules updated? (git submodule update) > [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh > -- > > Key: ARROW-5769 > URL: https://issues.apache.org/jira/browse/ARROW-5769 > Project: Apache Arrow > Issue Type: Test > Components: Java >Reporter: Sutou Kouhei >Priority: Blocker > Fix For: 0.14.0 > > > Details: > {noformat} > [INFO] [INFO] Running org.apache.arrow.flight.TestTls > [INFO] [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time > elapsed: 0.005 s <<< FAILURE! - in org.apache.arrow.flight.TestTls > [INFO] [ERROR] connectTls(org.apache.arrow.flight.TestTls) Time elapsed: > 0.004 s <<< ERROR! > [INFO] java.lang.RuntimeException: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44) > [INFO] Caused by: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44) > [INFO] > [INFO] [ERROR] rejectInvalidCert(org.apache.arrow.flight.TestTls) Time > elapsed: 0 s <<< ERROR! > [INFO] java.lang.Exception: Unexpected exception, > expected but was > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62) > [INFO] Caused by: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62) > [INFO] > [INFO] [ERROR] rejectHostname(org.apache.arrow.flight.TestTls) Time elapsed: > 0.001 s <<< ERROR! > [INFO] java.lang.Exception: Unexpected exception, > expected but was > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78) > [INFO] Caused by: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78) > [INFO] > [INFO] [INFO] Running org.apache.arrow.flight.TestServerOptions > [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 0.114 s - in org.apache.arrow.flight.TestServerOptions > [INFO] [INFO] > [INFO] [INFO] Results: > [INFO] [INFO] > [INFO] [ERROR] Errors: > [INFO] [ERROR] TestTls.connectTls:44->test:98->lambda$test$3:105 Runtime > java.io.FileNotFound... > [INFO] [ERROR] TestTls.rejectHostname » Unexpected exception, > expected [INFO] [ERROR] TestTls.rejectInvalidCert » Unexpected exception, > expected [INFO] [INFO] > [INFO] [ERROR] Tests run: 27, Failures: 0, Errors: 3, Skipped: 10 > {noformat} > I'm not sure whether this is my environment problem or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5769) [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh
[ https://issues.apache.org/jira/browse/ARROW-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874581#comment-16874581 ] lidavidm commented on ARROW-5769: - Don't have access to my work computer, but this is because the Flight tests now need ARROW_TEST_DATA. Adding a line under SOURCE_DIR in 00-prepare.sh like this should work, so long as submodules are initialized and updated: {{export ARROW_TEST_DATA="${SOURCE_DIR}/../../testing/data"}} > [Java] org.apache.arrow.flight.TestTls is failed via dev/release/00-prepare.sh > -- > > Key: ARROW-5769 > URL: https://issues.apache.org/jira/browse/ARROW-5769 > Project: Apache Arrow > Issue Type: Test > Components: Java >Reporter: Sutou Kouhei >Priority: Blocker > Fix For: 0.14.0 > > > Details: > {noformat} > [INFO] [INFO] Running org.apache.arrow.flight.TestTls > [INFO] [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time > elapsed: 0.005 s <<< FAILURE! - in org.apache.arrow.flight.TestTls > [INFO] [ERROR] connectTls(org.apache.arrow.flight.TestTls) Time elapsed: > 0.004 s <<< ERROR! > [INFO] java.lang.RuntimeException: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44) > [INFO] Caused by: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at org.apache.arrow.flight.TestTls.connectTls(TestTls.java:44) > [INFO] > [INFO] [ERROR] rejectInvalidCert(org.apache.arrow.flight.TestTls) Time > elapsed: 0 s <<< ERROR! > [INFO] java.lang.Exception: Unexpected exception, > expected but was > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62) > [INFO] Caused by: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectInvalidCert(TestTls.java:62) > [INFO] > [INFO] [ERROR] rejectHostname(org.apache.arrow.flight.TestTls) Time elapsed: > 0.001 s <<< ERROR! > [INFO] java.lang.Exception: Unexpected exception, > expected but was > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:105) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78) > [INFO] Caused by: java.io.FileNotFoundException: > /home/kou/work/cpp/arrow.pravindra/java/flight/../../testing/data/flight/cert0.pem > (No such file or directory) > [INFO]at > org.apache.arrow.flight.TestTls.lambda$test$3(TestTls.java:102) > [INFO]at org.apache.arrow.flight.TestTls.test(TestTls.java:98) > [INFO]at > org.apache.arrow.flight.TestTls.rejectHostname(TestTls.java:78) > [INFO] > [INFO] [INFO] Running org.apache.arrow.flight.TestServerOptions > [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 0.114 s - in org.apache.arrow.flight.TestServerOptions > [INFO] [INFO] > [INFO] [INFO] Results: > [INFO] [INFO] > [INFO] [ERROR] Errors: > [INFO] [ERROR] TestTls.connectTls:44->test:98->lambda$test$3:105 Runtime > java.io.FileNotFound... > [INFO] [ERROR] TestTls.rejectHostname » Unexpected exception, > expected [INFO] [ERROR] TestTls.rejectInvalidCert » Unexpected exception, > expected [INFO] [INFO] > [INFO] [ERROR] Tests run: 27, Failures: 0, Errors: 3, Skipped: 10 > {noformat} > I'm not sure whether this is my environment problem or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881990#comment-16881990 ] lidavidm commented on ARROW-5610: - Right now, if you define an extension type in Java whose type name is not "arrow.py_extension_type", you have no way of writing the Python equivalent. I think what's needed is a C++ extension type whose implementation dispatches to Python callbacks, which can be instantiated and registered with an arbitrary name. > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881990#comment-16881990 ] lidavidm edited comment on ARROW-5610 at 7/10/19 12:17 PM: --- Right now, if you define an extension type in Java whose type name is not "arrow.py_extension_type", you have no way of writing the Python equivalent. I think what's needed is a C++ extension type whose implementation dispatches to Python callbacks, which can be instantiated and registered with an arbitrary name. Basically, what Joris suggests with the extension type that can be parameterized with a name. was (Author: lidavidm): Right now, if you define an extension type in Java whose type name is not "arrow.py_extension_type", you have no way of writing the Python equivalent. I think what's needed is a C++ extension type whose implementation dispatches to Python callbacks, which can be instantiated and registered with an arbitrary name. > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882039#comment-16882039 ] lidavidm commented on ARROW-5610: - Hmm, to be frank, I haven't gotten a chance to evaluate the API yet, I'm just going off of reading the implementation. I'll follow up once I do get a chance to try it out. But I'm still not sure why the language that the type is defined in should matter - I thought the idea is there is an abstract type, and you implement it for each language, and right now the main limitation is that you can't implement a Python type with an arbitrary name. (i.e. I want a java UuidType, which uses java's UUID class, to map seamlessly to a Python UuidType using the uuid module). But I suppose I should put up some code before I keep talking! > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881990#comment-16881990 ] lidavidm edited comment on ARROW-5610 at 7/10/19 12:21 PM: --- Right now, if you define an extension type in Java whose type name is not "arrow.py_extension_type", you have no way of writing the Python equivalent. I think what's needed is a C++ extension type whose implementation dispatches to Python callbacks, which can be instantiated and registered with an arbitrary name. Basically, what Joris suggests with the extension type that can be parameterized with a name. I think the implementation would be similar to Flight, where you have a C++ subclass that contains a set of function pointers and a Python object, and invokes those functions by passing them the Python object and the C++ arguments. The functions would be defined in Cython and take care of bridging between the two. I don't think there needs to be a Python-specific registry, just a way to hook arbitrary Python into the extension type metadata (de)serialization. Right now, the C++ subclass calls a specific classmethod that tries to unpickle the metadata, but there's no reason why it has to be pickle. was (Author: lidavidm): Right now, if you define an extension type in Java whose type name is not "arrow.py_extension_type", you have no way of writing the Python equivalent. I think what's needed is a C++ extension type whose implementation dispatches to Python callbacks, which can be instantiated and registered with an arbitrary name. Basically, what Joris suggests with the extension type that can be parameterized with a name. > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882027#comment-16882027 ] lidavidm commented on ARROW-5610: - Say in Java, you have an extension type representing an IP address. Its type name is "ip" and its metadata indicates whether it's IPv4 or IPv6. You want to transfer a table containing a column of that type to and from Python. Right now, you can read that data from Python, but you can't create a table with that type. You could implement an extension type that behaves the same, but Java wouldn't recognize it, because the type name has to be "arrow.py_extension_type". You also can't deserialize the metadata written by Java or write metadata that Java can read, as it's not in pickle format. > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5930) [FlightRPC] [Python] Flight CI tests are failing
lidavidm created ARROW-5930: --- Summary: [FlightRPC] [Python] Flight CI tests are failing Key: ARROW-5930 URL: https://issues.apache.org/jira/browse/ARROW-5930 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Python Affects Versions: 0.14.0 Reporter: lidavidm Flight tests segfault on Travis: [https://travis-ci.org/apache/arrow/jobs/557690959] The relevant part is: {noformat} Fatal Python error: Aborted Thread 0x7fcf009fe700 (most recent call first): File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", line 386 in _server_thread File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", line 864 in run File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", line 916 in _bootstrap_inner File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", line 884 in _bootstrap Current thread 0x7fcf1f9fa700 (most recent call first): File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", line 411 in flight_server File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/contextlib.py", line 99 in __exit__ File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", line 670 in test_tls_do_get File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py", line 165 in pytest_pyfunc_call File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 81 in File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 87 in _hookexec File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", line 289 in __call__ File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py", line 1451 in runtest File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 117 in pytest_runtest_call File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 81 in File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 87 in _hookexec File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", line 289 in __call__ File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 192 in File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 220 in from_call File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 192 in call_runtest_hook File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 167 in call_and_report File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 87 in runtestprotocol File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", line 72 in pytest_runtest_protocol File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 81 in File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 87 in _hookexec File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", line 289 in __call__ File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py", line 278 in pytest_runtestloop File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 81 in File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", line 87 in _hookexec File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", line 289 in __call__ File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py", line 257 in _main File "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py",
[jira] [Commented] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket
[ https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877880#comment-16877880 ] lidavidm commented on ARROW-5829: - I think this is the same underlying cause - on OSX, the release verification script uses a custom temp dir that makes the domain socket path too long. > [Java] failure in TestServerOptions.domainSocket > > > Key: ARROW-5829 > URL: https://issues.apache.org/jira/browse/ARROW-5829 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Reporter: Pindikura Ravindra >Priority: Major > > I see this consistently with the 0.14.0 rc0 release candidate on mac mojave. > java.io.IOException: Failed to bind > at > org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46) > Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: > Address already in use > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket
[ https://issues.apache.org/jira/browse/ARROW-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5829: Affects Version/s: 0.14.0 > [Java] failure in TestServerOptions.domainSocket > > > Key: ARROW-5829 > URL: https://issues.apache.org/jira/browse/ARROW-5829 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.14.0 >Reporter: Pindikura Ravindra >Priority: Major > > I see this consistently with the 0.14.0 rc0 release candidate on mac mojave. > java.io.IOException: Failed to bind > at > org.apache.arrow.flight.TestServerOptions.domainSocket(TestServerOptions.java:46) > Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: > Address already in use > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5876) [FlightRPC] Implement basic auth across all languages
lidavidm created ARROW-5876: --- Summary: [FlightRPC] Implement basic auth across all languages Key: ARROW-5876 URL: https://issues.apache.org/jira/browse/ARROW-5876 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Affects Versions: 0.14.0 Reporter: lidavidm We should implement a set of common auth methods in Flight itself to have standardized ways to do things like basic auth. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5875) [FlightRPC] Test RPC features in integration tests
lidavidm created ARROW-5875: --- Summary: [FlightRPC] Test RPC features in integration tests Key: ARROW-5875 URL: https://issues.apache.org/jira/browse/ARROW-5875 Project: Apache Arrow Issue Type: Test Components: FlightRPC, Integration Affects Versions: 0.14.0 Reporter: lidavidm We should test not just wire-format compatibility, but feature-compatibility in Flight integration tests. This may mean adding a separate suite of tests to the integration script. Features that should be tested include: * Authentication * Error & error code propagation * Cancellation * Flow control/backpressure -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5877) [FlightRPC] Document caveats around usage of auth APIs
lidavidm created ARROW-5877: --- Summary: [FlightRPC] Document caveats around usage of auth APIs Key: ARROW-5877 URL: https://issues.apache.org/jira/browse/ARROW-5877 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: lidavidm The Flight Handshake method can be insecure, and currently has a surprising failure mode; we should document these caveats (blocks forever waiting on client/server; insecure depending on deployment configuration) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885281#comment-16885281 ] lidavidm commented on ARROW-5610: - [~wesmckinn] I'll try to take a pass this week, if time permits; we would like this functionality. (By the way, is there a Jira explicitly for being able to hook into to_pandas, or a suggested way to efficiently do a custom Pandas conversion?) > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5930) [FlightRPC] [Python] Flight CI tests are failing
[ https://issues.apache.org/jira/browse/ARROW-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886300#comment-16886300 ] lidavidm commented on ARROW-5930: - [~pitrou] I had just come to the same conclusion. I have a change so that Shutdown doesn't use a DCHECK, but instead does an actual check, so at least it won't segfault. I can add additional synchronization on the Python side. > [FlightRPC] [Python] Flight CI tests are failing > > > Key: ARROW-5930 > URL: https://issues.apache.org/jira/browse/ARROW-5930 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Python >Affects Versions: 0.14.0 >Reporter: lidavidm >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Flight tests segfault on Travis: > [https://travis-ci.org/apache/arrow/jobs/557690959] > The relevant part is: > {noformat} > Fatal Python error: Aborted > Thread 0x7fcf009fe700 (most recent call first): > File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", > line 386 in _server_thread > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", > line 864 in run > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", > line 916 in _bootstrap_inner > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/threading.py", > line 884 in _bootstrap > Current thread 0x7fcf1f9fa700 (most recent call first): > File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", > line 411 in flight_server > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/contextlib.py", > line 99 in __exit__ > File "/home/travis/build/apache/arrow/python/pyarrow/tests/test_flight.py", > line 670 in test_tls_do_get > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py", > line 165 in pytest_pyfunc_call > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", > line 187 in _multicall > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", > line 81 in > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", > line 87 in _hookexec > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", > line 289 in __call__ > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/python.py", > line 1451 in runtest > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 117 in pytest_runtest_call > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", > line 187 in _multicall > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", > line 81 in > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", > line 87 in _hookexec > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", > line 289 in __call__ > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 192 in > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 220 in from_call > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 192 in call_runtest_hook > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 167 in call_and_report > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 87 in runtestprotocol > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/runner.py", > line 72 in pytest_runtest_protocol > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/callers.py", > line 187 in _multicall > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", > line 81 in > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/manager.py", > line 87 in _hookexec > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/pluggy/hooks.py", > line 289 in __call__ > File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/_pytest/main.py", > line 278 in
[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905428#comment-16905428 ] lidavidm commented on ARROW-5610: - [~jorisvandenbossche] the approach makes sense to me. I assume the generic ExtensionType would have a Python "vtable" for Python subclasses to implement the C++ methods, and that each Python subclass would somehow register a new instance of the C++ type (with corresponding Python method references) with the extension type registry? The registration method would need to support parameterized types as well (i.e. registering multiple instances of the same type with different parameters). There's still the reference loop between C++ and Python. In this case, since you have no way of re-instantiating the Python instance if the weak reference is dropped, you'd need some other way - so you might have to make the Python-side registry, as a way to get around the reference loop. (Then, during interpreter shutdown, you would drop all the C++ extension type instance references, then drop the Python references.) I think then, on the C++ side, the generic extension type instance would get instantiated, but there would be no way to instantiate the corresponding Python class without a separate registry, as you mention. So the unknown extension type then comes into play. Alternatively, Python subclasses could be required to register a factory method that takes the extension type name and metadata. > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898956#comment-16898956 ] lidavidm commented on ARROW-5610: - My apologies, I ended up being too busy to look at this. Thanks for the issue pointers. > [Python] Define extension type API in Python to "receive" or "send" a foreign > extension type > > > Key: ARROW-5610 > URL: https://issues.apache.org/jira/browse/ARROW-5610 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In work in ARROW-840, a static {{arrow.py_extension_type}} name is used. > There will be cases where an extension type is coming from another > programming language (e.g. Java), so it would be useful to be able to "plug > in" a Python extension type subclass that will be used to deserialize the > extension type coming over the wire. This has some different API requirements > since the serialized representation of the type will not have knowledge of > Python pickling, etc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6241) [Java] Failures on master
[ https://issues.apache.org/jira/browse/ARROW-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907664#comment-16907664 ] lidavidm commented on ARROW-6241: - These two commits merged cleanly but break each other: [https://github.com/apache/arrow/commit/6eae79000336788925fab1f1c011146e24c4838d] introduced use of Preconditions [https://github.com/apache/arrow/commit/c45def63963f5f70903e58492e22718cc9de6ed1] removed the import (as the change there made it unused) > [Java] Failures on master > - > > Key: ARROW-6241 > URL: https://issues.apache.org/jira/browse/ARROW-6241 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Wes McKinney >Priority: Blocker > Fix For: 0.15.0 > > > I'm getting builds failing today with errors like > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile > (default-compile) on project arrow-vector: Compilation failure: Compilation > failure: > [ERROR] > /home/travis/build/apache/arrow/java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java:[356,4] > error: cannot find symbol > [ERROR] symbol: variable Preconditions > [ERROR] location: class ListVector > [ERROR] > /home/travis/build/apache/arrow/java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java:[96,4] > error: cannot find symbol > [ERROR] symbol: variable Preconditions > [ERROR] location: class NonNullableStructVector > [ERROR] -> [Help 1] > {code} > see https://travis-ci.org/apache/arrow/jobs/571958044 > Is this introduced by a recent patch? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-4914) [Rust] Array slice returns incorrect bitmask
[ https://issues.apache.org/jira/browse/ARROW-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-4914: Labels: beginner (was: ) > [Rust] Array slice returns incorrect bitmask > > > Key: ARROW-4914 > URL: https://issues.apache.org/jira/browse/ARROW-4914 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.13.0 >Reporter: Neville Dipale >Priority: Blocker > Labels: beginner > > Slicing arrays changes the offset, length and null count of their array data, > but the bitmask is not changed. > This results in the correct null count, but the array values might be marked > incorrectly as valid/invalid based on the old bitmask positions before the > offset. > To reproduce, create an array with some null values, slice the array, and > then dbg!() it (after downcasting). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-2001) [Java] Add getInitReservation() to BufferAllocator interface similar to getLimit(), getHeadRoom() APIs
[ https://issues.apache.org/jira/browse/ARROW-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-2001: Labels: beginner newbie (was: newbie) > [Java] Add getInitReservation() to BufferAllocator interface similar to > getLimit(), getHeadRoom() APIs > -- > > Key: ARROW-2001 > URL: https://issues.apache.org/jira/browse/ARROW-2001 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Siddharth Teotia >Priority: Minor > Labels: beginner, newbie > > For capturing additional information for debugging/profiling purposes, it > will be useful to expose the init reservation for buffer allocator. > I would encourage someone new to the community to do this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5822) [Java] Provide a sample json file for the flight example
[ https://issues.apache.org/jira/browse/ARROW-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5822: Labels: beginner (was: ) > [Java] Provide a sample json file for the flight example > > > Key: ARROW-5822 > URL: https://issues.apache.org/jira/browse/ARROW-5822 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Liya Fan >Priority: Minor > Labels: beginner > > The flight package provides IntegrationTestClient and IntegrationTestServer > as sample implementations for client/server side. > In these implementations, the client sends the content of some json file to > the server. However, it is not clear what the format of the json file should > be like. > So it is desirable to also provide a sample json file, which makes it easier > to run the flight program. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5722) [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray
[ https://issues.apache.org/jira/browse/ARROW-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5722: Labels: beginner (was: ) > [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray > --- > > Key: ARROW-5722 > URL: https://issues.apache.org/jira/browse/ARROW-5722 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Chao Sun >Priority: Major > Labels: beginner > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5912) [Python] conversion from datetime objects with mixed timezones should normalize to UTC
[ https://issues.apache.org/jira/browse/ARROW-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5912: Labels: beginner (was: ) > [Python] conversion from datetime objects with mixed timezones should > normalize to UTC > -- > > Key: ARROW-5912 > URL: https://issues.apache.org/jira/browse/ARROW-5912 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Labels: beginner > Fix For: 1.0.0 > > > Currently, when having objects with mixed timezones, they are each separately > interpreted as their local time: > {code:python} > >>> ts_pd_paris = pd.Timestamp("1970-01-01 01:00", tz="Europe/Paris") > >>> ts_pd_paris > Timestamp('1970-01-01 01:00:00+0100', tz='Europe/Paris') > >>> ts_pd_helsinki = pd.Timestamp("1970-01-01 02:00", tz="Europe/Helsinki") > >>> ts_pd_helsinki > Timestamp('1970-01-01 02:00:00+0200', tz='Europe/Helsinki') > >>> a = pa.array([ts_pd_paris, ts_pd_helsinki]) > >>> > >>> > >>> a > > [ > 1970-01-01 01:00:00.00, > 1970-01-01 02:00:00.00 > ] > >>> a.type > TimestampType(timestamp[us]) > {code} > So both times are actually about the same moment in time (the same value in > UTC; in pandas their stored {{value}} is also the same), but once converted > to pyarrow, they are both tz-naive but no longer the same time. That seems > rather unexpected and a source for bugs. > I think a better option would be to normalize to UTC, and result in a > tz-aware TimestampArray with UTC as timezone. > That is also the behaviour of pandas if you force the conversion to result in > datetimes (by default pandas will keep them as object array preserving the > different timezones). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-2619) [Rust] Move JSON serde code to separate file/module
[ https://issues.apache.org/jira/browse/ARROW-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-2619: Labels: beginner (was: ) > [Rust] Move JSON serde code to separate file/module > --- > > Key: ARROW-2619 > URL: https://issues.apache.org/jira/browse/ARROW-2619 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Minor > Labels: beginner > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-1984) [Java] NullableDateMilliVector.getObject() should return a LocalDate, not a LocalDateTime
[ https://issues.apache.org/jira/browse/ARROW-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-1984: Labels: beginner (was: ) > [Java] NullableDateMilliVector.getObject() should return a LocalDate, not a > LocalDateTime > - > > Key: ARROW-1984 > URL: https://issues.apache.org/jira/browse/ARROW-1984 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Vanco Buca >Priority: Minor > Labels: beginner > > NullableDateMilliVector.getObject() today returns a LocalDateTime. However, > this vector is used to store date information, and thus, getObject() should > return a LocalDate. > Please note: there already exists a vector that returns LocalDateTime -- > the NullableTimestampMilliVector. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-3552) [Python] Implement pa.RecordBatch.serialize_to to write single message to an OutputStream
[ https://issues.apache.org/jira/browse/ARROW-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-3552: Labels: beginner (was: ) > [Python] Implement pa.RecordBatch.serialize_to to write single message to an > OutputStream > - > > Key: ARROW-3552 > URL: https://issues.apache.org/jira/browse/ARROW-3552 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: beginner > > {{RecordBatch.serialize}} writes in memory. This would help with shared > memory worksflows. See also pyarrow.ipc.write_tensor -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5374) [Python] pa.read_record_batch() doesn't work
[ https://issues.apache.org/jira/browse/ARROW-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5374: Labels: begin (was: ) > [Python] pa.read_record_batch() doesn't work > > > Key: ARROW-5374 > URL: https://issues.apache.org/jira/browse/ARROW-5374 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Priority: Major > Labels: begin > > {code:python} > >>> batch = pa.RecordBatch.from_arrays([pa.array([b"foo"], type=pa.utf8())], > >>> names=['strs']) > >>> > >>> stream = pa.BufferOutputStream() > >>> writer = pa.RecordBatchStreamWriter(stream, batch.schema) > >>> writer.write_batch(batch) > >>> > >>> > >>> writer.close() > >>> > >>> > >>> buf = stream.getvalue() > >>> > >>> > >>> pa.read_record_batch(buf, batch.schema) > >>> > >>> > Traceback (most recent call last): > File "", line 1, in > pa.read_record_batch(buf, batch.schema) > File "pyarrow/ipc.pxi", line 583, in pyarrow.lib.read_record_batch > check_status(ReadRecordBatch(deref(message.message.get()), > File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status > raise ArrowIOError(message) > ArrowIOError: Expected IPC message of type schema got record batch > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5374) [Python] pa.read_record_batch() doesn't work
[ https://issues.apache.org/jira/browse/ARROW-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5374: Labels: beginner (was: begin) > [Python] pa.read_record_batch() doesn't work > > > Key: ARROW-5374 > URL: https://issues.apache.org/jira/browse/ARROW-5374 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Priority: Major > Labels: beginner > > {code:python} > >>> batch = pa.RecordBatch.from_arrays([pa.array([b"foo"], type=pa.utf8())], > >>> names=['strs']) > >>> > >>> stream = pa.BufferOutputStream() > >>> writer = pa.RecordBatchStreamWriter(stream, batch.schema) > >>> writer.write_batch(batch) > >>> > >>> > >>> writer.close() > >>> > >>> > >>> buf = stream.getvalue() > >>> > >>> > >>> pa.read_record_batch(buf, batch.schema) > >>> > >>> > Traceback (most recent call last): > File "", line 1, in > pa.read_record_batch(buf, batch.schema) > File "pyarrow/ipc.pxi", line 583, in pyarrow.lib.read_record_batch > check_status(ReadRecordBatch(deref(message.message.get()), > File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status > raise ArrowIOError(message) > ArrowIOError: Expected IPC message of type schema got record batch > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-4176) [C++/Python] Human readable arrow schema comparison
[ https://issues.apache.org/jira/browse/ARROW-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-4176: Labels: beginner (was: ) > [C++/Python] Human readable arrow schema comparison > --- > > Key: ARROW-4176 > URL: https://issues.apache.org/jira/browse/ARROW-4176 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Florian Jetter >Priority: Minor > Labels: beginner > > When working with arrow schemas it would be helpful to have a human readable > representation of the diff between two schemas. > This could be either exposed as a function returning a string/diff object or > via a function raising an Exception with this information. > For instance: > {code} > schema_diff = get_schema_diff(schema1, schema2) > expected_diff = """ > - col_changed: int8 > + col_changed: double > + col_additional: int8 > """ > assert schema_diff == expected_diff > {code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-3776) [Rust] Mark methods that do not perform bounds checking as unsafe
[ https://issues.apache.org/jira/browse/ARROW-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-3776: Labels: beginner (was: ) > [Rust] Mark methods that do not perform bounds checking as unsafe > - > > Key: ARROW-3776 > URL: https://issues.apache.org/jira/browse/ARROW-3776 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Paddy Horan >Priority: Minor > Labels: beginner > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5248) [Python] support dateutil timezones
[ https://issues.apache.org/jira/browse/ARROW-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5248: Labels: beginner (was: ) > [Python] support dateutil timezones > --- > > Key: ARROW-5248 > URL: https://issues.apache.org/jira/browse/ARROW-5248 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > Labels: beginner > > The {{dateutil}} packages also provides a set of timezone objects > (https://dateutil.readthedocs.io/en/stable/tz.html) in addition to {{pytz}}. > In pyarrow, we only support pytz timezones (and the stdlib datetime.timezone > fixed offset): > {code} > In [2]: import dateutil.tz > > > In [3]: import pyarrow as pa > > > In [5]: pa.timestamp('us', dateutil.tz.gettz('Europe/Brussels')) > > > ... > ~/miniconda3/envs/dev37/lib/python3.7/site-packages/pyarrow/types.pxi in > pyarrow.lib.tzinfo_to_string() > ValueError: Unable to convert timezone > `tzfile('/usr/share/zoneinfo/Europe/Brussels')` to string > {code} > But pandas also supports dateutil timezones. As a consequence, when having a > pandas DataFrame that uses a dateutil timezone, you get an error when > converting to an arrow table. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-4111) [Python] Create time types from Python sequences of integers
[ https://issues.apache.org/jira/browse/ARROW-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-4111: Labels: beginner (was: ) > [Python] Create time types from Python sequences of integers > > > Key: ARROW-4111 > URL: https://issues.apache.org/jira/browse/ARROW-4111 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: beginner > Fix For: 1.0.0 > > > This works for dates, but not times: > {code} > > traceback > > > > def test_to_pandas_deduplicate_date_time(): > nunique = 100 > repeats = 10 > > unique_values = list(range(nunique)) > > cases = [ > # array type, to_pandas options > ('date32', {'date_as_object': True}), > ('date64', {'date_as_object': True}), > ('time32[ms]', {}), > ('time64[us]', {}) > ] > > for array_type, pandas_options in cases: > > arr = pa.array(unique_values * repeats, type=array_type) > pyarrow/tests/test_convert_pandas.py:2392: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > pyarrow/array.pxi:175: in pyarrow.lib.array > return _sequence_to_array(obj, mask, size, type, pool, from_pandas) > pyarrow/array.pxi:36: in pyarrow.lib._sequence_to_array > check_status(ConvertPySequence(sequence, mask, options, )) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > raise ArrowInvalid(message) > E pyarrow.lib.ArrowInvalid: ../src/arrow/python/python_to_arrow.cc:1012 : > ../src/arrow/python/iterators.h:70 : Could not convert 0 with type int: > converting to time32 > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5830) [C++] Stop using memcmp in TensorEquals
[ https://issues.apache.org/jira/browse/ARROW-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5830: Labels: beginner (was: ) > [C++] Stop using memcmp in TensorEquals > --- > > Key: ARROW-5830 > URL: https://issues.apache.org/jira/browse/ARROW-5830 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kenta Murata >Priority: Major > Labels: beginner > > Because memcmp problematic for comparing floating-point values, such as NaNs. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-2719) [Python/C++] ArrowSchema not hashable
[ https://issues.apache.org/jira/browse/ARROW-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-2719: Labels: beginner (was: ) > [Python/C++] ArrowSchema not hashable > - > > Key: ARROW-2719 > URL: https://issues.apache.org/jira/browse/ARROW-2719 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Florian Jetter >Priority: Minor > Labels: beginner > > The arrow schema is immutable and should provide a way of hashing itself. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-2652) [C++/Python] Document how to provide information on segfaults
[ https://issues.apache.org/jira/browse/ARROW-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-2652: Labels: beginner (was: ) > [C++/Python] Document how to provide information on segfaults > - > > Key: ARROW-2652 > URL: https://issues.apache.org/jira/browse/ARROW-2652 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation, Python >Reporter: Uwe L. Korn >Priority: Major > Labels: beginner > > We often have users that report segmentation faults in {{pyarrow}}. This will > sadly keep reappearing as we also don't have the magical ability of writing > 100%-bug-free code. Thus we should have a small section in our documentation > on how people can give us the relevant information in the case of a > segmentation fault. Preferably the documentation covers {{gdb}} and {{lldb}}. > They both have similar commands but differ in some minor flags. > For one of the example comments I gave to a user in tickets see > https://github.com/apache/arrow/issues/2089#issuecomment-393477116 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-2665) [Python/C++] Add index() method to find first occurence of Python scalar
[ https://issues.apache.org/jira/browse/ARROW-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-2665: Labels: Analytics beginner (was: Analytics) > [Python/C++] Add index() method to find first occurence of Python scalar > > > Key: ARROW-2665 > URL: https://issues.apache.org/jira/browse/ARROW-2665 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Uwe L. Korn >Priority: Major > Labels: Analytics, beginner > > Python lists have an {{index(x, start, end)}} method to find the first > occurence of an element. We should add a method with the same interface > supporting Python scalars on the typical triplet > {{Array/ChunkedArray/Columns}}. > See also > https://docs.python.org/3.6/tutorial/datastructures.html#more-on-lists -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-2857) [Python] Expose integration test JSON read/write in Python API
[ https://issues.apache.org/jira/browse/ARROW-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-2857: Labels: beginner (was: ) > [Python] Expose integration test JSON read/write in Python API > -- > > Key: ARROW-2857 > URL: https://issues.apache.org/jira/browse/ARROW-2857 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney >Priority: Major > Labels: beginner > Fix For: 1.0.0 > > > This should be clearly marked to not be used for persistence -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6426) [FlightRPC] Expose gRPC configuration knobs in Flight
lidavidm created ARROW-6426: --- Summary: [FlightRPC] Expose gRPC configuration knobs in Flight Key: ARROW-6426 URL: https://issues.apache.org/jira/browse/ARROW-6426 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Affects Versions: 0.14.1 Reporter: lidavidm Assignee: lidavidm We should not expose gRPC symbols/APIs publicly, but should still provide a way to configure gRPC options as they may be needed in deployments (for instance, we ran into an issue with gRPC keepalives). In Java, this is fortunately solvable with reflection, but this is impossible in C++/Python. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6412) [C++] arrow-flight-test can crash because of port allocation
[ https://issues.apache.org/jira/browse/ARROW-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921408#comment-16921408 ] lidavidm commented on ARROW-6412: - Another way to solve this is to allow binding to port 0, then adding a method to get the actual port. gRPC supports this, and we've already separated Init and Serve in FlightServerBase. > [C++] arrow-flight-test can crash because of port allocation > > > Key: ARROW-6412 > URL: https://issues.apache.org/jira/browse/ARROW-6412 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > I get this error sometimes locally when running the tests in parallel: > {code} > [--] 11 tests from TestFlightClient > [ RUN ] TestFlightClient.ListFlights > E0902 15:13:55.996271678 17281 socket_utils_common_posix.cc:201] check for > SO_REUSEPORT: {"created":"@1567430035.996256600","description":"SO_REUSEPORT > unavailable on compiling > system","file":"../src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":169} > [ OK ] TestFlightClient.ListFlights (17 ms) > [ RUN ] TestFlightClient.GetFlightInfo > E0902 15:13:56.013065793 17281 server_chttp2.cc:40] > {"created":"@1567430036.013032600","description":"No address added out of > total 1 > resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567430036.013029044","description":"Unable > to configure > socket","fd":6,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567430036.013021880","description":"Address > already in > use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address > already in use","syscall":"bind"}]}]} > ../src/arrow/flight/flight_test.cc:271: Failure > Failed > 'server->Init(options)' failed with Unknown error: Server did not start > properly > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923418#comment-16923418 ] lidavidm commented on ARROW-2428: - Hi Joris, overall I agree with the approach here. It's a little unfortunate that Pandas doesn't have a general column/table metadata mechanism... I agree that we want both a default hook for ExtensionType->Pandas conversions, and a way to override conversions on an individual basis. I think adding a new argument to {{to_pandas}} is easier than maintaining yet another function registry. Similarly, adding a conversion method on {{ExtensionType}} (or maybe that should be a future {{ExtensionArray}} class?) would be preferable to maintaining a registry. If we have something like {{pa.ExtensionType.\_\_pandas_array\_\_}}, should we also have {{pa.ExtensionType.\_\_pandas_dtype\_\_}}? > [Python] Add API to map Arrow types (including extension types) to pandas > ExtensionArray instances for to_pandas conversions > > > Key: ARROW-2428 > URL: https://issues.apache.org/jira/browse/ARROW-2428 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 1.0.0 > > > With the next release of Pandas, it will be possible to define custom column > types that back a {{pandas.Series}}. Thus we will not be able to cover all > possible column types in the {{to_pandas}} conversion by default as we won't > be aware of all extension arrays. > To enable users to create {{ExtensionArray}} instances from Arrow columns in > the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} > call where they can overload the default conversion routines with the ones > that produce their {{ExtensionArray}} instances. > This should avoid additional copies in the case where we would nowadays first > convert the Arrow column into a default Pandas column (probably of object > type) and the user would afterwards convert it to a more efficient > {{ExtensionArray}}. This hook here will be especially useful when you build > {{ExtensionArrays}} where the storage is backed by Arrow. > The meta-issue that tracks the implementation inside of Pandas is: > https://github.com/pandas-dev/pandas/issues/19696 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924221#comment-16924221 ] lidavidm commented on ARROW-2428: - It sounds like a new registry isn't needed, but adding parameters to to_pandas would be useful for customizing conversions of built-in types; Joris notes Fletcher would want to use that. > [Python] Add API to map Arrow types (including extension types) to pandas > ExtensionArray instances for to_pandas conversions > > > Key: ARROW-2428 > URL: https://issues.apache.org/jira/browse/ARROW-2428 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 1.0.0 > > > With the next release of Pandas, it will be possible to define custom column > types that back a {{pandas.Series}}. Thus we will not be able to cover all > possible column types in the {{to_pandas}} conversion by default as we won't > be aware of all extension arrays. > To enable users to create {{ExtensionArray}} instances from Arrow columns in > the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} > call where they can overload the default conversion routines with the ones > that produce their {{ExtensionArray}} instances. > This should avoid additional copies in the case where we would nowadays first > convert the Arrow column into a default Pandas column (probably of object > type) and the user would afterwards convert it to a more efficient > {{ExtensionArray}}. This hook here will be especially useful when you build > {{ExtensionArrays}} where the storage is backed by Arrow. > The meta-issue that tracks the implementation inside of Pandas is: > https://github.com/pandas-dev/pandas/issues/19696 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6528) [C++] Spurious Flight test failures (port allocation failure)
[ https://issues.apache.org/jira/browse/ARROW-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927569#comment-16927569 ] lidavidm commented on ARROW-6528: - Oh, I see you just merged it. > [C++] Spurious Flight test failures (port allocation failure) > - > > Key: ARROW-6528 > URL: https://issues.apache.org/jira/browse/ARROW-6528 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > > Seems like our port allocation scheme inside unit tests is still not very > reliable :-/ > https://ci.ursalabs.org/#/builders/71/builds/4147/steps/8/logs/stdio > {code} > [--] 3 tests from TestMetadata > [ RUN ] TestMetadata.DoGet > E0905 12:45:40.322644527 10203 server_chttp2.cc:40] > {"created":"@1567687540.322612245","description":"No address added out of > total 1 > resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567687540.322609844","description":"Unable > to configure > socket","fd":7,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567687540.322602634","description":"Address > already in > use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address > already in use","syscall":"bind"}]}]} > ../src/arrow/flight/flight_test.cc:429: Failure > Failed > 'server->Init(options)' failed with Unknown error: Server did not start > properly > /buildbot/AMD64_Conda_Python_3_7/cpp/build-support/run-test.sh: line 97: > 10203 Segmentation fault (core dumped) $TEST_EXECUTABLE "$@" 2>&1 > 10204 Done| $ROOT/build-support/asan_symbolize.py > 10205 Done| ${CXXFILT:-c++filt} > 10206 Done| > $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE > 10207 Done| $pipe_cmd 2>&1 > 10208 Done| tee $LOGFILE > /buildbot/AMD64_Conda_Python_3_7/cpp/build/src/arrow/flight > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6528) [C++] Spurious Flight test failures (port allocation failure)
[ https://issues.apache.org/jira/browse/ARROW-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927567#comment-16927567 ] lidavidm commented on ARROW-6528: - [~pitrou] as part of ARROW-6426 ([PR|https://github.com/apache/arrow/pull/5292]) I added FlightServerBase#port() getter in C++, so we could instead bind to port 0 instead of racing to find a free port. Want me to pull that out separately? > [C++] Spurious Flight test failures (port allocation failure) > - > > Key: ARROW-6528 > URL: https://issues.apache.org/jira/browse/ARROW-6528 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > > Seems like our port allocation scheme inside unit tests is still not very > reliable :-/ > https://ci.ursalabs.org/#/builders/71/builds/4147/steps/8/logs/stdio > {code} > [--] 3 tests from TestMetadata > [ RUN ] TestMetadata.DoGet > E0905 12:45:40.322644527 10203 server_chttp2.cc:40] > {"created":"@1567687540.322612245","description":"No address added out of > total 1 > resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567687540.322609844","description":"Unable > to configure > socket","fd":7,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567687540.322602634","description":"Address > already in > use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address > already in use","syscall":"bind"}]}]} > ../src/arrow/flight/flight_test.cc:429: Failure > Failed > 'server->Init(options)' failed with Unknown error: Server did not start > properly > /buildbot/AMD64_Conda_Python_3_7/cpp/build-support/run-test.sh: line 97: > 10203 Segmentation fault (core dumped) $TEST_EXECUTABLE "$@" 2>&1 > 10204 Done| $ROOT/build-support/asan_symbolize.py > 10205 Done| ${CXXFILT:-c++filt} > 10206 Done| > $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE > 10207 Done| $pipe_cmd 2>&1 > 10208 Done| tee $LOGFILE > /buildbot/AMD64_Conda_Python_3_7/cpp/build/src/arrow/flight > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6062) [FlightRPC] Allow timeouts on all stream reads
[ https://issues.apache.org/jira/browse/ARROW-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932370#comment-16932370 ] lidavidm commented on ARROW-6062: - I would actually like this to be a timeout per read operation, but this isn't possible unless we implement async APIs (gRPC generally only offers per-call timeouts which we already have). In a long streaming operation, you may not have a bound on how long the entire read will take, but you do have a bound on how long an individual operation will take. > [FlightRPC] Allow timeouts on all stream reads > -- > > Key: ARROW-6062 > URL: https://issues.apache.org/jira/browse/ARROW-6062 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC >Reporter: lidavidm >Priority: Major > Fix For: 1.0.0 > > > Anywhere where we offer reading from a stream in Flight, we need to offer a > timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5722) [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray
[ https://issues.apache.org/jira/browse/ARROW-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924290#comment-16924290 ] lidavidm commented on ARROW-5722: - [~csun], I have some basic implementations. Printing nested arrays is difficult; I've punted on that for StructArray/ListArray. Really, we need Array to have a Debug trait bound as well - is that acceptable? In the future, we may also want a pretty-printer API to make nested arrays look better (with indentation, etc). > [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray > --- > > Key: ARROW-5722 > URL: https://issues.apache.org/jira/browse/ARROW-5722 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Chao Sun >Priority: Major > Labels: beginner > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-4914) [Rust] Array slice returns incorrect bitmask
[ https://issues.apache.org/jira/browse/ARROW-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924348#comment-16924348 ] lidavidm commented on ARROW-4914: - It looks like this was fixed as part of ARROW-4853, can it be closed? > [Rust] Array slice returns incorrect bitmask > > > Key: ARROW-4914 > URL: https://issues.apache.org/jira/browse/ARROW-4914 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.13.0 >Reporter: Neville Dipale >Priority: Blocker > Labels: beginner > > Slicing arrays changes the offset, length and null count of their array data, > but the bitmask is not changed. > This results in the correct null count, but the array values might be marked > incorrectly as valid/invalid based on the old bitmask positions before the > offset. > To reproduce, create an array with some null values, slice the array, and > then dbg!() it (after downcasting). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6074) [FlightRPC] Implement middleware
lidavidm created ARROW-6074: --- Summary: [FlightRPC] Implement middleware Key: ARROW-6074 URL: https://issues.apache.org/jira/browse/ARROW-6074 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: lidavidm -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6075) [FlightRPC] Handle uncaught exceptions in middleware
lidavidm created ARROW-6075: --- Summary: [FlightRPC] Handle uncaught exceptions in middleware Key: ARROW-6075 URL: https://issues.apache.org/jira/browse/ARROW-6075 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: lidavidm -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6165) [Integration] Use multiprocessing to run integration tests on multiple CPU cores
[ https://issues.apache.org/jira/browse/ARROW-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902399#comment-16902399 ] lidavidm commented on ARROW-6165: - We'll also have to find free ports for the Flight tests, as right now they assume a hardcoded port. (Not hard to do, fortunately.) > [Integration] Use multiprocessing to run integration tests on multiple CPU > cores > > > Key: ARROW-6165 > URL: https://issues.apache.org/jira/browse/ARROW-6165 > Project: Apache Arrow > Issue Type: Improvement > Components: Integration >Reporter: Wes McKinney >Priority: Major > > The stdout/stderr will have to be captured appropriate so that the console > output when run in parallel is still readable -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6062) [FlightRPC] Allow timeouts on all stream reads
lidavidm created ARROW-6062: --- Summary: [FlightRPC] Allow timeouts on all stream reads Key: ARROW-6062 URL: https://issues.apache.org/jira/browse/ARROW-6062 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: lidavidm Anywhere where we offer reading from a stream in Flight, we need to offer a timeout. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6064) [FlightRPC] [C++] Clean up IWYU
lidavidm created ARROW-6064: --- Summary: [FlightRPC] [C++] Clean up IWYU Key: ARROW-6064 URL: https://issues.apache.org/jira/browse/ARROW-6064 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: lidavidm As reported by Wes https://gist.github.com/wesm/af59c7cc8f35c6fd806b0d041b816da8 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6063) [FlightRPC] Implement "half-closed" semantics for DoPut
lidavidm created ARROW-6063: --- Summary: [FlightRPC] Implement "half-closed" semantics for DoPut Key: ARROW-6063 URL: https://issues.apache.org/jira/browse/ARROW-6063 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: lidavidm Both sides on a DoPut should be able to half-close the stream, indicating they will no longer write. This allows a client to indicate that it is done writing data to the server, while still leaving the stream open so it can read metadata responses until the server finishes. Meanwhile, the server would see that the client has finished and be able to stop blocking on reading client messages. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6136) [FlightRPC][Java] Don't double-close response stream
lidavidm created ARROW-6136: --- Summary: [FlightRPC][Java] Don't double-close response stream Key: ARROW-6136 URL: https://issues.apache.org/jira/browse/ARROW-6136 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 0.14.1 Reporter: lidavidm Assignee: lidavidm Fix For: 0.15.0 DoPut in Java double-closes the metadata response stream: if the service implementation sends an error down that channel, the Flight implementation will unconditionally try to complete the stream, violating the gRPC semantics (either an error or a completion may be sent, never both). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5971) [Website] Blog post introducing Arrow Flight
[ https://issues.apache.org/jira/browse/ARROW-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887246#comment-16887246 ] lidavidm commented on ARROW-5971: - I'd be happy to look over anything. We're also working on a post of our own, though that probably won't come in the near future. It might be interesting to show Python numbers as well - it actually performs better than Java in our tests (don't think I can share actual data though). > [Website] Blog post introducing Arrow Flight > > > Key: ARROW-5971 > URL: https://issues.apache.org/jira/browse/ARROW-5971 > Project: Apache Arrow > Issue Type: New Feature > Components: Website >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > I think it's a good time to be bringing more attention to our work over the > last 12-14 months on Arrow Flight. > I would be OK to draft an initial version of the blog post, and I can > circulate to others for review / edit / comment. If there are particular > benchmarks you would like to see included, contributing code for that would > also be helpful. My plan would be to show tcp throughput on localhost, and > node-to-node throughput on a local gigabit ethernet network. I think the > localhost throughput is important to show that Flight is a tool that you > would want to reach for for faster throughput in high performance networking > (e.g. 10/40 gigabit) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-5978) [FlightRPC] [Java] Integration test client doesn't close buffers
lidavidm created ARROW-5978: --- Summary: [FlightRPC] [Java] Integration test client doesn't close buffers Key: ARROW-5978 URL: https://issues.apache.org/jira/browse/ARROW-5978 Project: Apache Arrow Issue Type: Test Components: FlightRPC, Integration, Java Affects Versions: 0.14.0 Reporter: lidavidm Assignee: lidavidm Fix For: 1.0.0 The integration test client doesn't close any of the clients or free any of the buffers it creates. Trying to do so leads to a leak problem on the dictionary vector case. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-5979) [FlightRPC] Expose (de)serialization of protocol types
lidavidm created ARROW-5979: --- Summary: [FlightRPC] Expose (de)serialization of protocol types Key: ARROW-5979 URL: https://issues.apache.org/jira/browse/ARROW-5979 Project: Apache Arrow Issue Type: New Feature Components: FlightRPC Reporter: lidavidm It would be nice to be able to serialize/deserialize Flight types (e.g. FlightInfo) to/from the binary representations, in order to interoperate with systems that might want to provide (say) Flight tickets or FlightInfo without using the Flight protocol. For instance, you might have a search server that exposes a REST interface and wants to provide FlightInfo objects for Flight clients, without having to listen on a separate port. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (ARROW-5979) [FlightRPC] Expose (de)serialization of protocol types
[ https://issues.apache.org/jira/browse/ARROW-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm reassigned ARROW-5979: --- Assignee: lidavidm > [FlightRPC] Expose (de)serialization of protocol types > -- > > Key: ARROW-5979 > URL: https://issues.apache.org/jira/browse/ARROW-5979 > Project: Apache Arrow > Issue Type: New Feature > Components: FlightRPC >Reporter: lidavidm >Assignee: lidavidm >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > It would be nice to be able to serialize/deserialize Flight types (e.g. > FlightInfo) to/from the binary representations, in order to interoperate with > systems that might want to provide (say) Flight tickets or FlightInfo without > using the Flight protocol. For instance, you might have a search server that > exposes a REST interface and wants to provide FlightInfo objects for Flight > clients, without having to listen on a separate port. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6017) [FlightRPC] Allow creating Locations with unknown schemes
lidavidm created ARROW-6017: --- Summary: [FlightRPC] Allow creating Locations with unknown schemes Key: ARROW-6017 URL: https://issues.apache.org/jira/browse/ARROW-6017 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Affects Versions: 0.14.0 Reporter: lidavidm Assignee: lidavidm Right now Flight clients error if the server hands them a Location with an unknown scheme. Also, you can't construct locations with non-gRPC schemes. Since Flight will want to support other transports, we should allow unknown schemes up until a client is constructed for them. This would also make it possible for a Flight service to reference non-Flight services in FlightInfo. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5996) [Java] Avoid resource leak in flight service
[ https://issues.apache.org/jira/browse/ARROW-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5996: Component/s: FlightRPC > [Java] Avoid resource leak in flight service > > > Key: ARROW-5996 > URL: https://issues.apache.org/jira/browse/ARROW-5996 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > # In FlightService#doPutCustom, the flight stream must be closed, even if an > exception is thrown during the call of responseObserver.onError > # The exception occurred during the call to acceptPut should not be > swallowed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (ARROW-5877) [FlightRPC] Document caveats around usage of auth APIs
[ https://issues.apache.org/jira/browse/ARROW-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm reassigned ARROW-5877: --- Assignee: lidavidm > [FlightRPC] Document caveats around usage of auth APIs > -- > > Key: ARROW-5877 > URL: https://issues.apache.org/jira/browse/ARROW-5877 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC >Reporter: lidavidm >Assignee: lidavidm >Priority: Major > > The Flight Handshake method can be insecure, and currently has a surprising > failure mode; we should document these caveats (blocks forever waiting on > client/server; insecure depending on deployment configuration) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5877) [FlightRPC] Fix auth incompatibilities between Python/Java
[ https://issues.apache.org/jira/browse/ARROW-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5877: Description: It turns out the blocking-forever issue was a combination of problems in Python and Java. We should simply fix the issues. --- The Flight Handshake method can be insecure, and currently has a surprising failure mode; we should document these caveats (blocks forever waiting on client/server; insecure depending on deployment configuration) was:The Flight Handshake method can be insecure, and currently has a surprising failure mode; we should document these caveats (blocks forever waiting on client/server; insecure depending on deployment configuration) Summary: [FlightRPC] Fix auth incompatibilities between Python/Java (was: [FlightRPC] Document caveats around usage of auth APIs) > [FlightRPC] Fix auth incompatibilities between Python/Java > -- > > Key: ARROW-5877 > URL: https://issues.apache.org/jira/browse/ARROW-5877 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC >Reporter: lidavidm >Assignee: lidavidm >Priority: Major > > It turns out the blocking-forever issue was a combination of problems in > Python and Java. We should simply fix the issues. > --- > The Flight Handshake method can be insecure, and currently has a surprising > failure mode; we should document these caveats (blocks forever waiting on > client/server; insecure depending on deployment configuration) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5681) [FlightRPC] Wrap gRPC exceptions/statuses
[ https://issues.apache.org/jira/browse/ARROW-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5681: Component/s: C++ Summary: [FlightRPC] Wrap gRPC exceptions/statuses (was: [FlightRPC][Java] Wrap gRPC exceptions) > [FlightRPC] Wrap gRPC exceptions/statuses > - > > Key: ARROW-5681 > URL: https://issues.apache.org/jira/browse/ARROW-5681 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, FlightRPC, Java >Reporter: lidavidm >Assignee: lidavidm >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Instead of requiring users to catch/throw StatusRuntimeException in Flight > services/clients, and thereby leaking gRPC details, we should provide our own > set of exceptions and status codes. This way, services can provide proper > error messages and error codes to clients, which can catch the exception and > respond properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-6677) [FlightRPC][C++] Document using Flight in C++
lidavidm created ARROW-6677: --- Summary: [FlightRPC][C++] Document using Flight in C++ Key: ARROW-6677 URL: https://issues.apache.org/jira/browse/ARROW-6677 Project: Apache Arrow Issue Type: Bug Components: Documentation, FlightRPC Reporter: lidavidm Assignee: lidavidm Fix For: 1.0.0 Similarly to ARROW-6390 for Python, we should have C++ documentation for Flight. -- This message was sent by Atlassian Jira (v8.3.4#803005)