Re: OversizedAllocationException for pandas_udf in pyspark

2019-03-14 Thread Micah Kornfield
Hi, >From the error it looks like this might potentially be some sort of integer overflow, but it is hard to say. Could you try to get a minimal reproduction of the error [1] , and open a JIRA Issue [2] with it? Thanks, Micah [1] https://stackoverflow.com/help/mcve [2]

[jira] [Created] (ARROW-4887) [GLib] Add garrow_array_count()

2019-03-14 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4887: --- Summary: [GLib] Add garrow_array_count() Key: ARROW-4887 URL: https://issues.apache.org/jira/browse/ARROW-4887 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-4886) [Rust] Inconsistent behaviour with casting sliced primitive array to list array

2019-03-14 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4886: - Summary: [Rust] Inconsistent behaviour with casting sliced primitive array to list array Key: ARROW-4886 URL: https://issues.apache.org/jira/browse/ARROW-4886

[jira] [Created] (ARROW-4885) [Python] read_csv() can't handle decimal128() columns

2019-03-14 Thread Diego Argueta (JIRA)
Diego Argueta created ARROW-4885: Summary: [Python] read_csv() can't handle decimal128() columns Key: ARROW-4885 URL: https://issues.apache.org/jira/browse/ARROW-4885 Project: Apache Arrow

[jira] [Created] (ARROW-4884) [C++] conda-forge thrift-cpp package not available via pkg-config or cmake

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4884: --- Summary: [C++] conda-forge thrift-cpp package not available via pkg-config or cmake Key: ARROW-4884 URL: https://issues.apache.org/jira/browse/ARROW-4884 Project:

[jira] [Created] (ARROW-4883) [Python] read_csv() gives mojibake if given file object in text mode

2019-03-14 Thread Diego Argueta (JIRA)
Diego Argueta created ARROW-4883: Summary: [Python] read_csv() gives mojibake if given file object in text mode Key: ARROW-4883 URL: https://issues.apache.org/jira/browse/ARROW-4883 Project: Apache

[jira] [Created] (ARROW-4882) Add "Count" and "Sum" functions

2019-03-14 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-4882: --- Summary: Add "Count" and "Sum" functions Key: ARROW-4882 URL: https://issues.apache.org/jira/browse/ARROW-4882 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-4881) [Python] bundle_zlib CMake function still uses ARROW_BUILD_TOOLCHAIN

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4881: --- Summary: [Python] bundle_zlib CMake function still uses ARROW_BUILD_TOOLCHAIN Key: ARROW-4881 URL: https://issues.apache.org/jira/browse/ARROW-4881 Project: Apache

[jira] [Created] (ARROW-4880) [Python] python/asv-build.sh is probably broken after CMake refactor

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4880: --- Summary: [Python] python/asv-build.sh is probably broken after CMake refactor Key: ARROW-4880 URL: https://issues.apache.org/jira/browse/ARROW-4880 Project: Apache

[jira] [Created] (ARROW-4879) [C++] cmake can't use conda's flatbuffers

2019-03-14 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4879: Summary: [C++] cmake can't use conda's flatbuffers Key: ARROW-4879 URL: https://issues.apache.org/jira/browse/ARROW-4879 Project: Apache Arrow Issue

[jira] [Created] (ARROW-4878) [C++] ARROW_DEPENDENCY_SOURCE=CONDA does not work properly with MSVC

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4878: --- Summary: [C++] ARROW_DEPENDENCY_SOURCE=CONDA does not work properly with MSVC Key: ARROW-4878 URL: https://issues.apache.org/jira/browse/ARROW-4878 Project: Apache

[Rust] Table/DataFrame style API

2019-03-14 Thread Andy Grove
Hi, I have a PR open [1] to add a DataFrame/Table style API for building a logical query plan. So far I only added a couple methods to it, but here is a usage example: let t = ctx.table("aggregate_test_100")?; let t2 = t .select_columns(vec!["c1", "c2", "c11"])? .limit(10)?; This builds

Re: Timeline for 0.13 Arrow release

2019-03-14 Thread Wes McKinney
Out of the open / in-progress issues still in the backlog: C++-related: 25 C#: 2 CI-related: 2 Dev tools: 2 Docs: 4 Flight: 3 Packaging: 4 Python: 23 (14 tagged as bugs) Ruby: 1 Rust: 14 I'm going to try to grind out as many issues as I can in the next few days, and at least get a sense of "how

[jira] [Created] (ARROW-4877) [Plasma] CI failure in test_plasma_list

2019-03-14 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4877: --- Summary: [Plasma] CI failure in test_plasma_list Key: ARROW-4877 URL: https://issues.apache.org/jira/browse/ARROW-4877 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-4876) Port MutableBuffer to csharp

2019-03-14 Thread Prashanth Govindarajan (JIRA)
Prashanth Govindarajan created ARROW-4876: - Summary: Port MutableBuffer to csharp Key: ARROW-4876 URL: https://issues.apache.org/jira/browse/ARROW-4876 Project: Apache Arrow Issue

Re: Passing File Descriptors in the Low-Level API

2019-03-14 Thread Wes McKinney
hi Brian, This is mostly an Arrow platform question so I'm copying the Arrow mailing list. You can open a file using an existing file descriptor using ReadableFile::Open https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145 The documentation for this function says: "The

RE: Publishing C# NuGet package

2019-03-14 Thread Eric Erhardt
Thanks Wes. I have a PR up for this. https://github.com/apache/arrow/pull/3891 How do I update the wiki page? Is this source controlled somewhere? I assume we want to add a new section after

[jira] [Created] (ARROW-4875) [C++] MSVC Boost warnings after CMake refactor on cmake 3.12

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4875: --- Summary: [C++] MSVC Boost warnings after CMake refactor on cmake 3.12 Key: ARROW-4875 URL: https://issues.apache.org/jira/browse/ARROW-4875 Project: Apache Arrow

[jira] [Created] (ARROW-4874) Cannot read parquet from encrypted hdfs

2019-03-14 Thread Jesse Lord (JIRA)
Jesse Lord created ARROW-4874: - Summary: Cannot read parquet from encrypted hdfs Key: ARROW-4874 URL: https://issues.apache.org/jira/browse/ARROW-4874 Project: Apache Arrow Issue Type: Bug

Re: Timeline for 0.13 Arrow release

2019-03-14 Thread Krisztián Szűcs
Submitted the packaging builds: https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93=build-452 On Thu, Mar 14, 2019 at 4:19 PM Wes McKinney wrote: > The CMake refactor is merged! Kudos to Uwe for 3+ weeks of hard labor on > this. > > We should run all the packaging tasks and get a

[jira] [Created] (ARROW-4873) [C++] ARROW_DEPENDENCY_SOURCE should not be overridden to CONDA if ARROW_PACKAGE_PREFIX is set by user

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4873: --- Summary: [C++] ARROW_DEPENDENCY_SOURCE should not be overridden to CONDA if ARROW_PACKAGE_PREFIX is set by user Key: ARROW-4873 URL:

[jira] [Created] (ARROW-4872) [Python] Keep backward compatibility for ParquetDatasetPiece

2019-03-14 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4872: -- Summary: [Python] Keep backward compatibility for ParquetDatasetPiece Key: ARROW-4872 URL: https://issues.apache.org/jira/browse/ARROW-4872 Project: Apache Arrow

[jira] [Created] (ARROW-4870) ruby gemspec has wrong msys2 dependency listed

2019-03-14 Thread Dominic Sisneros (JIRA)
Dominic Sisneros created ARROW-4870: --- Summary: ruby gemspec has wrong msys2 dependency listed Key: ARROW-4870 URL: https://issues.apache.org/jira/browse/ARROW-4870 Project: Apache Arrow

Re: [Discuss][Java, Non-C++ generally] Support for 64-bit int array lengths?

2019-03-14 Thread Wes McKinney
hi Micah, Given the constraints from Netty in Java, I would say that it makes sense to raise an exception if encountering a Field length exceeding 2^31 - 1 in length (I think there are already some checks, but we can add more checks during the IPC metadata read pass). With shared memory / zero

[jira] [Created] (ARROW-4871) [Flight][Java] Handle large Flight messages

2019-03-14 Thread David Li (JIRA)
David Li created ARROW-4871: --- Summary: [Flight][Java] Handle large Flight messages Key: ARROW-4871 URL: https://issues.apache.org/jira/browse/ARROW-4871 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-4869) [C++] Use of gmock fails in compute/kernels/util-internal-test.cc

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4869: --- Summary: [C++] Use of gmock fails in compute/kernels/util-internal-test.cc Key: ARROW-4869 URL: https://issues.apache.org/jira/browse/ARROW-4869 Project: Apache Arrow

[jira] [Created] (ARROW-4868) [C++][Gandiva] Build fails with system Boost on Ubuntu Trusty 14.04

2019-03-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4868: --- Summary: [C++][Gandiva] Build fails with system Boost on Ubuntu Trusty 14.04 Key: ARROW-4868 URL: https://issues.apache.org/jira/browse/ARROW-4868 Project: Apache

[jira] [Created] (ARROW-4866) [C++] zstd ExternalProject failing on Windows

2019-03-14 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4866: -- Summary: [C++] zstd ExternalProject failing on Windows Key: ARROW-4866 URL: https://issues.apache.org/jira/browse/ARROW-4866 Project: Apache Arrow Issue Type:

Call for presentations - ApacheCon North America

2019-03-14 Thread Zoran Regvart
Hi Apache developers, (apologies if you're receiving this e-mail multiple times on different dev@ lists) I'd like to draw your attention to call for presentations that is now open for ApacheCon North America 2019 -- marking the 20 year anniversary of ASF; that will be held in Las Vegas this

Re: [Discuss][Java, Non-C++ generally] Support for 64-bit int array lengths?

2019-03-14 Thread Ravindra Pindikura
@Jacques Nadeau would have more background on this. Here's my understanding : On Thu, Mar 14, 2019 at 12:08 PM Micah Kornfield wrote: > I was working on a proof of concept java implementation for LargeList [1] > implementation (64-bit array offsets). Our Java implementation doesn't > appear

[jira] [Created] (ARROW-4865) [Rust] Support casting lists and primitives to lists

2019-03-14 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4865: - Summary: [Rust] Support casting lists and primitives to lists Key: ARROW-4865 URL: https://issues.apache.org/jira/browse/ARROW-4865 Project: Apache Arrow

[Discuss][Java, Non-C++ generally] Support for 64-bit int array lengths?

2019-03-14 Thread Micah Kornfield
I was working on a proof of concept java implementation for LargeList [1] implementation (64-bit array offsets). Our Java implementation doesn't appear to support Vectors/Arrays larger then Integer.MAX_VALUE addressable space. It looks like Message.fbs was updated quite a while ago to support