date:20200826

Arrow Dataset API on Ceph

2020-08-26 Thread Ivo Jimenez

Dear Arrow community, We are writing to share our thoughts about designing an Apache Arrow-native storage system leveraging Ceph’s extensibility mechanism as part of the SkyhookDM project and aim for a design that leverages Arrow as much as possible, both on the client API a

Writing parquet to new filesystem API

2020-08-26 Thread Weston Pace

Forgive me if I am missing something obvious but I am unable to write parquet files using the new filesystem API. Here is what I am trying: https://gist.github.com/westonpace/0c5ef01e21a40de5d16608b7f12de80d I receive an error: OSError: Unrecognized filesystem:

Re: Creating filesystems that read local files

2020-08-26 Thread Weston Pace

Ok. I think I have it figured out as: num_rows = 0 dataset = pa.dataset.dataset(short_files, filesystem=subtree_filesystem) for fragment in dataset.get_fragments(): fragment.ensure_complete_metadata() if fragment.row_groups: for row_group in fragment.row_groups: num_ro

conversion between pyspark.DataFrame and pyarrow.Table

2020-08-26 Thread Radu Teodorescu

Hi, I noticed that arrow is mentioned as an optional intermediary format for converting between pandas DFs and spark DFs. Is there a way to explicitly convert an pyarrow Table to a spark DataFrame and the other way around. Absent that, going pysprak->pandas->pyarrow and back works but it’s obviou

Re: Authentication Redesign

2020-08-26 Thread James Duong

Hi everyone, I've updated the PR and responded to comments in the proposal document. The PR now makes handshake optional and sends auth information with every request. The client now needs to supply a CredentialCallOption containing auth information (as a Consumer), which we'll convert to a gRPC C

Re: Creating filesystems that read local files

2020-08-26 Thread Weston Pace

Thanks Joris / Antoine, It appears I will have to learn the new datasets API. I can confirm that SubTreeFileSystem is working for me. In case there is still interest here is the code I had from before reproducing the issue: https://gist.github.com/westonpace/4107c1c492cdd78d611595d43e72964d It

Re: [Rust] Async record batch reader?

2020-08-26 Thread Andy Grove

Hi Max, I have been experimenting with an async record batch reader and was able to get a working version, but I had to use channels to communicate with the parquet reader, which ran on its own thread. I have taken a step back now that I have some experience of this and look forward to working wi

Re: [Rust] Async record batch reader?

2020-08-26 Thread Vertexclique

Hi Max; There is an open issue in the tracker which needs to gather feedback to finalize how we will do overall async interface which spans to arrow crates. Please check that issue, it is mentioning sans IO and several design considerations. Imo we can carry async discussion under it. Best, Ma

[Rust] Async record batch reader?

2020-08-26 Thread Max Burke

Out of curiosity, is anyone working on a record batch reader that's async friendly? Wanting to know if it's something I could wait on/help out with, or if it's something we could start working on too. -- -Max

Re: Unexpected keyword argument 'split_blocks'

2020-08-26 Thread Joris Van den Bossche

Hi Jayant, This keyword was introduced in pyarrow 0.16, so you will need to update your installation. For updating, if `conda update pyarrow` indicates it is already installed, you can also try `conda install pyarrow=0.16`. However, it might be you will need to use the conda-forge channel, as I a

Re: Questions about S3 options

2020-08-26 Thread Joris Van den Bossche

Hi Weston, Sorry for the late reply. For using S3 in pyarrow, there are indeed 2 options: using the implementation provided by arrow (`pyarrow.fs.S3FileSystem`) or using s3fs which gets wrapped by pyarrow. Note that the wrapper is not actually DaskFileSystem: for the legacy filesystems we use s3fs

Unexpected keyword argument 'split_blocks'

2020-08-26 Thread Jayant Singh

Good Evening, My name is Jayant and I am using pyarrow version 0.15.0 with python 3.8 on my MacOs catalina, latest version. As per the documentation (link ), I ran following code to avoid memory doubling; df=table.to_pandas(split_blocks=True, self_

Re: Creating filesystems that read local files

2020-08-26 Thread Joris Van den Bossche

Hi Weston, Currently there are two filesystems interfaces in pyarrow, a legacy one in `pyarrow.filesystem` and a new one in `pyarrow.fs` (see https://issues.apache.org/jira/browse/ARROW-9645 and https://arrow.apache.org/docs/python/filesystems_deprecated.html, docs are still a bit scarce). Based

RE: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-26 Thread Kazuaki Ishizaki

Hi, I waited for comments regarding Java Big-Endian (BE) support during my one-week vacation. Thank you for good suggestions and comments. I already responded to some questions in another mail. This mail addresses the remaining questions: Use cases, holistic strategy for BE support, and testing

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-26 Thread Kazuaki Ishizaki

Hi Micah, Thank you for expanding the scope for Big Endian support in Arrow. I am glad to see this when I am back from one-week vacation. I agree with this since we have just seen the kickoff of BE support in Go. Hi Wes, Thank you for your positive comments. We should carefully implement BE su

RE: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-26 Thread Kazuaki Ishizaki

Kazuaki Ishizaki, Ph.D., Senior Technical Staff Member (STSM), IBM Research - Tokyo ACM Distinguished Member - Apache Spark committer - IBM Academy of Technology Member Wes McKinney wrote on 2020/08/26 21:27:49: > From: Wes McKinney > To: dev , Micah Kornfield > Cc: Fan Liya > Date: 2020/08

Re: Creating filesystems that read local files

2020-08-26 Thread Antoine Pitrou

Hi Weston, Can you show the code for your experiment? (or post equivalent code) Regards Antoine. Le 25/08/2020 à 23:38, Weston Pace a écrit : > I created a RelativeFileSystem that extended FileSystem and proxied > calls to a LocalFileSystem instance. This filesystem allowed me to > specify

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-26 Thread Wes McKinney

hi Micah, I agree with your reasoning. If supporting BE in some languages (e.g. Java) is impractical due to performance regressions on LE platforms, then I don't think it's worth it. But if it can be handled at compile time or without runtime overhead, and tested / maintained properly on an ongoin

[NIGHTLY] Arrow Build Report for Job nightly-2020-08-26-0

2020-08-26 Thread Crossbow

Arrow Build Report for Job nightly-2020-08-26-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-26-0 Failed Tasks: - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-26-0-azure-conda-osx-clang-py36 - cond

Arrow Dataset API on Ceph

Writing parquet to new filesystem API

Re: Creating filesystems that read local files

conversion between pyspark.DataFrame and pyarrow.Table

Re: Authentication Redesign

Re: Creating filesystems that read local files

Re: [Rust] Async record batch reader?

Re: [Rust] Async record batch reader?

[Rust] Async record batch reader?

Re: Unexpected keyword argument 'split_blocks'

Re: Questions about S3 options

Unexpected keyword argument 'split_blocks'

Re: Creating filesystems that read local files

RE: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

RE: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Re: Creating filesystems that read local files

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

[NIGHTLY] Arrow Build Report for Job nightly-2020-08-26-0

19 matches

Site Navigation

Mail list logo

Footer information