Re: How to get "standard" binary columns out of a pyarrow table

2018-02-01 Thread Eli
Can I perhaps assist? If I can get a bit more specifics of what needs to be done, I think I can help. I'm ok with cython, looking at some C++ code etc. ​ Sent with ProtonMail Secure Email. ​ Original Message On February 1, 2018 3:31 PM, Wes McKinney

Re: How to get "standard" binary columns out of a pyarrow table

2018-02-01 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-2068, which may help. This is an accessible issue for someone in the community to work on; I'm not sure when I'll be able to get to it. Thanks Wes On Thu, Feb 1, 2018 at 8:27 AM, Eli wrote: > Hey Wes, > > I understand

Re: How to get "standard" binary columns out of a pyarrow table

2018-02-01 Thread Eli
Hey Wes, I understand there's another pointer, a definition level pointer, which is basically a null location marker column. Exposing it as well to pick out the nulls would be awesome. The types of interest (to me) are varchars/strings, bools and numbers, just basic primitive types that also

[jira] [Created] (ARROW-2077) [Python] Document on how to use Storefact & Arrow to read Parquet from S3/Azure/...

2018-02-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2077: -- Summary: [Python] Document on how to use Storefact & Arrow to read Parquet from S3/Azure/... Key: ARROW-2077 URL: https://issues.apache.org/jira/browse/ARROW-2077

Re: Arrow PR backlog: please help

2018-02-01 Thread Uwe L. Korn
I just went over a lot of open PRs and sadly I wasn't able to reduce the number of open ones significantly. Some of them make slow progress and it might be worthwhile to jump in in a week, for now I would rather wait and let the initial authors finish them to get more involved in the project.

Re: Arrow PR backlog: please help

2018-02-01 Thread Uwe L. Korn
CircleCI requires more permissions than Travis and Apache Infra don't want to give it to them. This might be different now that we have the gitbox setup instead of the previous Apache git mirroring. > Am 01.02.2018 um 20:08 schrieb Phillip Cloud : > > What is the main

[jira] [Created] (ARROW-2076) [Python] Display slowest test durations

2018-02-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2076: -- Summary: [Python] Display slowest test durations Key: ARROW-2076 URL: https://issues.apache.org/jira/browse/ARROW-2076 Project: Apache Arrow Issue Type:

Re: Arrow PR backlog: please help

2018-02-01 Thread Phillip Cloud
What is the main barrier to getting CircleCI to work with Apache projects? On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn wrote: > I just went over a lot of open PRs and sadly I wasn't able to reduce the > number of open ones significantly. Some of them make slow progress and it >

[jira] [Created] (ARROW-2075) [Python] Add section for integrations with PyTorch, TensorFlow

2018-02-01 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2075: --- Summary: [Python] Add section for integrations with PyTorch, TensorFlow Key: ARROW-2075 URL: https://issues.apache.org/jira/browse/ARROW-2075 Project: Apache Arrow

[jira] [Created] (ARROW-2073) [Python] Create StructArray from sequence of tuples given a known data type

2018-02-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2073: - Summary: [Python] Create StructArray from sequence of tuples given a known data type Key: ARROW-2073 URL: https://issues.apache.org/jira/browse/ARROW-2073 Project:

[jira] [Created] (ARROW-2074) [Python] Allow type inference for struct arrays

2018-02-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2074: - Summary: [Python] Allow type inference for struct arrays Key: ARROW-2074 URL: https://issues.apache.org/jira/browse/ARROW-2074 Project: Apache Arrow Issue

Arrow PR backlog: please help

2018-02-01 Thread Wes McKinney
hi folks, We've had a rough couple of weeks in our PR queue due to various CI issues causing a high incidence of build failures: * Package dependency upgrades (Thrift -- this has been fixed) * Failures due possibly to VM setting changes in Travis CI (memory thrashing / VM timeouts, see

Re: Arrow PR backlog: please help

2018-02-01 Thread Phillip Cloud
I'll follow up with them and shoot an email over to see if we can use circle with gitbox repos. On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney wrote: > Does someone want to ask Infra about it? I haven't asked them since we > migrated to GitBox > > On Thu, Feb 1, 2018 at 2:15

Re: Arrow PR backlog: please help

2018-02-01 Thread Phillip Cloud
JIRA-ized: https://issues.apache.org/jira/browse/INFRA-15964 On Thu, Feb 1, 2018 at 3:59 PM Phillip Cloud wrote: > Ok, will do. > > On Thu, Feb 1, 2018 at 3:56 PM Wes McKinney wrote: > >> You'll have to open an INFRA ticket on JIRA >> >> On Thu, Feb 1,

Re: Arrow PR backlog: please help

2018-02-01 Thread Phillip Cloud
Ok, will do. On Thu, Feb 1, 2018 at 3:56 PM Wes McKinney wrote: > You'll have to open an INFRA ticket on JIRA > > On Thu, Feb 1, 2018 at 3:53 PM, Phillip Cloud wrote: > > I'll follow up with them and shoot an email over to see if we can use > > circle

[jira] [Created] (ARROW-2081) Hdfs client isn't fork-safe

2018-02-01 Thread Jim Crist (JIRA)
Jim Crist created ARROW-2081: Summary: Hdfs client isn't fork-safe Key: ARROW-2081 URL: https://issues.apache.org/jira/browse/ARROW-2081 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-2079) Possibly use `_common_metadata` for schema if `_metadata` isn't available

2018-02-01 Thread Jim Crist (JIRA)
Jim Crist created ARROW-2079: Summary: Possibly use `_common_metadata` for schema if `_metadata` isn't available Key: ARROW-2079 URL: https://issues.apache.org/jira/browse/ARROW-2079 Project: Apache

Writing nested parquet data using pyarrow

2018-02-01 Thread Ishaan Joshi
Wes and co., First off, great project ! I was able to read the docs and get going in under a day, the APIs are super easy to use. That being said, I'm a tad stuck, and having exhausted google-fu, am here to assistance. I want to use pyarrow to write a nested dataset in parquet. The schema is