Re: conversion between pyspark.DataFrame and pyarrow.Table

2020-08-31 Thread Micah Kornfield
Hi Radu, I'm not a spark expert, but I haven't seen any documentation on direct conversion. You might be better off asking the user@spark or dev@spark mailing lists. Thanks, Micah On Wed, Aug 26, 2020 at 1:46 PM Radu Teodorescu wrote: > Hi, > I noticed that arrow is mentioned as an optional

Re: [DISCUSS] Big Endian support in Arrow

2020-08-31 Thread Sutou Kouhei
Hi, In "Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)" on Sun, 30 Aug 2020 22:11:46 -0700, Jacques Nadeau wrote: > I know I don't want to have to go and debug on a remote BE > system if some tests starts failing for that platform... We can use

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Fan Liya
Thank Kazuaki for the survey and thank Micah for starting the discussion. I do not oppose supporting BE. In fact, I am in general optimistic about the performance impact (for Java). IMO, this is going to be a painful way (many byte order related problems are tricky to debug), so I hope we can

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Bryan Cutler
I also think this would be a worthwhile addition and help the project expand in more areas. Beyond the Apache Spark optimization use case, having Arrow interoperability with the Python data science stack on BE would be very useful. I have looked at the remaining PRs for Java and they seem pretty

[FlightRPC] Add a "Flight SQL" extension on top of FlightRPC

2020-08-31 Thread Ryan Nicholson
Hello everyone, I would like to propose the following specification to ease adoption of SQL based clients and backends while leveraging data streams for transporting data. This specification entails a series of protobuf messages to be used in the opaque messaging framework payloads to enable

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Wes McKinney
I think it's well within the right of an implementation to reject BE data (or non-native-endian), but if an implementation chooses to implement and maintain the endianness conversions, then it does not seem so bad to me. On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau wrote: > > And yes, for

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau
And yes, for those of you looking closely, I commented on ARROW-245 when it was committed. I just forgot about it. It looks like I had mostly the same concerns then that I do now :) Now I'm just more worried about format sprawl... On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau wrote: > What do

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau
> > What do you mean? The Endianness field (a Big|Little enum) was added 4 > years ago: > https://issues.apache.org/jira/browse/ARROW-245 I didn't realize that was done, my bad. Good example of format rot from my pov.

Re: Arrow Dataset API on Ceph

2020-08-31 Thread Ben Kietzman
> as far as we can tell, this filesystem layer > is unaware of expressions, record batches, etc You're correct that the filesystem layer doesn't directly support Expressions. However the datasets API includes the Partitioning classes which embed expressions in paths. Depending on what expressions

Re: Arbitrary user-defined metadata in feather

2020-08-31 Thread Neal Richardson
Hi Steve, Key-value metadata is exposed in both Python and R. See https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html?highlight=metadata#pyarrow.Schema.metadata and https://arrow.apache.org/docs/r/articles/arrow.html#r-object-attributes, respectively. Neal On Mon, Aug 31, 2020 at

Arbitrary user-defined metadata in feather

2020-08-31 Thread Sun Yijiang
Is there a way to read/write user-defined metadata in IPC format? As far as I know, metadata seems to exist for data schema only. It would be very helpful to expose key-value metadata interface in C++/Python and R APIs. Best, Steve

[NIGHTLY] Arrow Build Report for Job nightly-2020-08-31-0

2020-08-31 Thread Crossbow
Arrow Build Report for Job nightly-2020-08-31-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-31-0 Failed Tasks: - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-31-0-azure-conda-osx-clang-py36 -

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Antoine Pitrou
Le 31/08/2020 à 07:11, Jacques Nadeau a écrit : > I didn't realize that Ishizaki isn't just proposing a BE platform support, > he is proposing a new BE version of the format. What do you mean? The Endianness field (a Big|Little enum) was added 4 years ago: