Re: [Discuss][Python] Stop publishing universal wheels?

2022-10-27 Thread Uwe L. Korn
Hello, if we have wheels for x86_64 and arm64 individually, I don't see an argument for keeping universal2 ones. x86_64 Macs will probably stay around for a while as Apple is quite good in keeping old hardware updated, and the laptops themselves are pretty solid. Best Uwe On Thu, Oct 27, 2022

Re: [VOTE] Move issue tracking to GitHub Issues

2022-10-27 Thread Uwe L. Korn
+1 On Thu, Oct 27, 2022, at 5:13 PM, Nic wrote: > +1 > > On Thu, 27 Oct 2022 at 14:00, Alenka Frim > wrote: > >> +1 >> >> On Thu, Oct 27, 2022 at 2:36 PM prem sagar gali >> wrote: >> >> > +1 >> > >> > On Thu, Oct 27, 2022 at 7:13 AM Dewey Dunnington >> > wrote: >> > >> > > +1 (non-binding)! >>

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Uwe L. Korn
+1 (binding) On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote: > +1 (binding) > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace wrote: > >> +1 (binding) >> >> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb wrote: >> >> > Hello, >> > >> > As we have discussed[1][2] I would like to vote on the proposal

Re: [VOTE] Split Go release process

2024-08-27 Thread Uwe L. Korn
+1 (binding) On Tue, Aug 27, 2024, at 3:04 PM, Joris Van den Bossche wrote: > +1 (binding) > > On Mon, 26 Aug 2024 at 09:56, Antoine Pitrou wrote: >> >> +1 (binding) >> >> Le 26/08/2024 à 04:37, Sutou Kouhei a écrit : >> > Hi, >> > >> > I would like to propose splitting Go release process. >> > >

Re: [DISCUSS] Dropping support for Visual Studio 2015

2021-08-14 Thread Uwe L. Korn
+1 VS2017 should also be compatible with VS2015 so that this should cause any issues for downstream users that link dynamically. > Am 14.08.2021 um 01:56 schrieb Benjamin Kietzman : > > Thanks for commenting, all. I'll open a JIRA/PR to remove support next week. > >> On Tue, Aug 10, 2021, 09

Re: [VOTE] Release Apache Arrow 7.0.0 - RC6

2022-01-25 Thread Uwe L. Korn
Hello all, I sadly get an issue with compiling with GCC 7.5 at the moment as reported in https://issues.apache.org/jira/browse/ARROW-15444 We need this version to support CUDA-enabled and ppc64le builds on conda-forge. Cheers Uwe On Tue, Jan 25, 2022, at 10:35 AM, Krisztián Szűcs wrote: > Than

[C++] Compute: Datum and "ChunkedArray&" inputs

2020-04-07 Thread Uwe L. Korn
Hello all, I'm in the progress of changing the implementation of the Take kernel to work on ChunkedArrays without concatenating them into a single Array first. While working on the implementation, I realised that we switch often between Datum and the specific-typed parameters. This works quite

Re: [C++] Compute: Datum and "ChunkedArray&" inputs

2020-04-07 Thread Uwe L. Korn
-types. On Tue, Apr 7, 2020, at 1:00 PM, Uwe L. Korn wrote: > Hello all, > > I'm in the progress of changing the implementation of the Take kernel > to work on ChunkedArrays without concatenating them into a single Array > first. While working on the implementation, I rea

Re: [Python] black vs. autopep8

2020-04-09 Thread Uwe L. Korn
The non-configurability of black is one of the strongest arguments I see for black. The codestyle will always be subjective. From previous discussions I know that my personal preference of readability conflicts with that of Antoine and Wes, so will probably others. We have the same issue with us

Re: [VOTE] Release Apache Arrow 0.17.1 - RC1

2020-05-19 Thread Uwe L. Korn
Current status: 1. [done] rebase (not required for a patch release) 2. [done] upload source 3. [done] upload binaries 4. [done|in-pr] update website 5. [done] upload ruby gems 6. [ ] upload js packages 8. [done] upload C# packages 9. [ ] upload rust crates 10. [done] update conda recipes (

Re: Arrow Flight connector for SQL Server

2020-05-21 Thread Uwe L. Korn
Hello Brendan, welcome to the community. In addition to the folks at Dremio, I wanted to make you aware of the Python ODBC client library https://github.com/blue-yonder/turbodbc which provides a high-performance ODBC<->Arrow adapter. It is especially popular with MS SQL Server users as the fas

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-05-26-0

2020-05-26 Thread Uwe L. Korn
The conda builds are failing are we have exceed the storage available for our conda repository: You currently have 3 public packages and 0 packages that require to be authenticated. Using 10.0 GB of 3.0 GB storage I guess we something that deletes old builds automatically. On Tue, May 26, 2020

Re: Arrow sync all at 12pm US-Eastern / 16:00 UTC

2020-05-27 Thread Uwe L. Korn
No, we are just talking about removing static libraries from conda-forge that may be (/have been) used as part of the Arrow build. This shouldn't affect any non-conda Arrow users/developers. Cheers, Uwe On Wed, May 27, 2020, at 6:53 PM, Rémi Dettai wrote: > @Uwe: Just a quick question about the

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-05-30-0

2020-05-30 Thread Uwe L. Korn
https://github.com/apache/arrow/pull/7305 should enable us to upload conda packages again. On Sat, May 30, 2020, at 12:10 PM, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-05-30-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0 > >

Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Uwe L. Korn
Hello Rémi, under the hood jemalloc does quite similar things to what you describe. I'm not sure what the offset is in the current version but in earlier releases, it used a different allocation strategy for objects above 4MB. For the initial large allocation, you will see quite some copies as

Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Uwe L. Korn
On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote: > Hi Antoine ! > > I would indeed have expected jemalloc to do that (remap the pages) > I have no idea about the performance gain this would provide (if any). > Could be interesting to explore. This would actually be the most interesting thing.

[C++] Kernels with scalar input

2020-06-17 Thread Uwe L. Korn
Hello all, I'm trying to implement a `contains` kernel that takes as an input a StringArray and a scalar string (see https://issues.apache.org/jira/browse/ARROW-9160). I feel confident with the rest of the new Kernels setup but I didn't find an example kernel where we also pass in a scalar att

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Uwe L. Korn
With my conda-forge background, I would suggest to use clang as a performance baseline, because it's currently the only compiler that works reliably on all platforms. Most Linux distributions are nowadays built with gcc, also making a strong argument, but on OSX and Windows the picture is a bit

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-23 Thread Uwe L. Korn
FTR: We can use the latest(!) clang for all platform for conda and wheels. It isn't probably even that much of a complicated setup. On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote: > We should aim to improve the performance of the most widely used > *default* packages, which are p

Re: [VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0 release

2020-06-30 Thread Uwe L. Korn
+1 (binding) On Tue, Jun 30, 2020, at 11:11 AM, Neville Dipale wrote: > +1 (non-binding) > > On Tue, 30 Jun 2020 at 06:29, Ben Kietzman wrote: > > > +1 (non binding) > > > > On Tue, Jun 30, 2020, 00:25 Wes McKinney wrote: > > > > > +1 (binding) > > > > > > On Mon, Jun 29, 2020 at 10:49 PM Mica

Re: [VOTE] Permitting unsigned integers for Arrow dictionary indices

2020-06-30 Thread Uwe L. Korn
+1 (binding) On Tue, Jun 30, 2020, at 6:24 AM, Wes McKinney wrote: > +1 (binding) > > On Mon, Jun 29, 2020 at 11:11 PM Ben Kietzman > wrote: > > > > +1 (non binding) > > > > On Mon, Jun 29, 2020, 18:00 Wes McKinney wrote: > > > > > Hi, > > > > > > As discussed on the mailing list [1], it has b

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-06-30 Thread Uwe L. Korn
I'm also in favor of disabling support for now. Having to deal with broken files or the detection of various incompatible implementations in the long-term will harm more than not supporting LZ4 for a while. Snappy is generally more used than LZ4 in this category as it has been available since th

Re: Developing a C++ Python extension

2020-07-02 Thread Uwe L. Korn
I had so much fun with the wheels in the past, I'm now a happy member of conda-forge core instead :D The good thing first: * The C++ ABI didn't change between the manylinux versions, it is the old one in all cases. So you mix & match manylinux versions. The sad things: * The manylinuxX standa

Re: Developing a C++ Python extension

2020-07-02 Thread Uwe L. Korn
m all > into our library. Feel free to scrape the perspective repo's cmake > lists and setup.py for details. > > Tim Paine > tim.paine.nyc > > > On Jul 2, 2020, at 10:32, Uwe L. Korn wrote: > > > > I had so much fun with the wheels in the past,

Re: Developing a C++ Python extension

2020-07-02 Thread Uwe L. Korn
work though: import ctypes arrow_python = ctypes.CDLL('libarrow.so', ctypes.RTLD_GLOBAL) libarrow_python = ctypes.CDLL('libarrow_python.so', ctypes.RTLD_GLOBAL) On Thu, Jul 2, 2020, at 4:32 PM, Uwe L. Korn wrote: > I had so much fun with the wheels in the past, I'm no

Re: [DRAFT] Arrow Board Report July 2020

2020-07-08 Thread Uwe L. Korn
Happy with the current version. I think this gives enough input for the board. We have so much things happening that are much better presented in the process of the 1.0 release. On Wed, Jul 8, 2020, at 12:52 AM, Micah Kornfield wrote: > Worth mentioning the website work? > > On Tue, Jul 7, 2020

Re: Introducing Cylon

2020-07-22 Thread Uwe L. Korn
Hello Niranda, cool to see this. Feel free to open a PR to add it to the Powered By list on https://arrow.apache.org/powered_by/ Cheers Uwe On Tue, Jul 21, 2020, at 8:03 PM, Niranda Perera wrote: > Hi all, > > We would like to introduce Cylon to the Arrow community. It is an > open-source, lea

Re: [VOTE] Release Apache Arrow 1.0.0 - RC2

2020-07-24 Thread Uwe L. Korn
1. [done] rebase master 2. [done] upload source 3. [kszucs] upload binaries 4. [ ] update website 5. [ ] upload ruby gems 6. [ ] upload js packages 8. [ ] upload C# packages 9. [andygrove] upload rust crates 10. [uwe] update conda recipes 11. [kszucs] upload wheels to pypi 12. [ ] update ho

Re: Closing Plasma issues?

2020-09-07 Thread Uwe L. Korn
If we do that, we should be clear with that and remove the code. Shipping Plasma as part of the release and not maintaining it as other parts of the Arrow libraries seems inconsistent and will just be an annoyance to user to find a partly unusable component. Cheers Uwe On Mon, Sep 7, 2020, at

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-09-27-0

2020-09-27 Thread Uwe L. Korn
I'm working on a fix for the conda failures in https://github.com/apache/arrow/pull/8282 On Sun, Sep 27, 2020, at 12:20 PM, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-09-27-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0 > > Fa

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-10-02-0

2020-10-02 Thread Uwe L. Korn
conda-*-aarch64 hit the 1h time limit on drone.io, probably not easy to fix. On Fri, Oct 2, 2020, at 12:23 PM, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-10-02-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-02-0 > > Failed Tasks: >

Re: [VOTE] Accept donation of Julia implementation for Apache Arrow

2020-10-14 Thread Uwe L. Korn
+1 (binding) On Wed, Oct 14, 2020, at 3:58 PM, Andy Grove wrote: > +1 (binding) > > On Tue, Oct 13, 2020 at 8:26 PM Fan Liya wrote: > > > +1 (non-binding) > > > > Best, > > Liya Fan > > > > > > On Wed, Oct 14, 2020 at 9:02 AM Sutou Kouhei wrote: > > > > > +1 (binding) > > > > > > In > > > "

Re: [C++] Arrow to ORC type conversion

2020-10-18 Thread Uwe L. Korn
This sounds reasonable from an Arrow perspective, you might want to CC the ORC list as well or ask someone there to co-review your work in the adapter. Uwe > Am 18.10.2020 um 17:24 schrieb Ying Zhou : > > Hi, > > I’m developing the adapter that converts Arrow Arrays, ChunkedArrays, > RecordBa

Re: [VOTE] Release Apache Arrow 2.0.0 - RC2

2020-10-19 Thread Uwe L. Korn
Trying to verify on macOS but run into the following two issues: * The default S3 region is „eu-central-1“ for me despite setting LANG=C * llvm@10 is not available for homebrew anymore, see also https://github.com/Homebrew/homebrew-core/pull/62798#issuecomment-711606370

Re: [VOTE] Release Apache Arrow 2.0.0 - RC2

2020-10-19 Thread Uwe L. Korn
s to address this with Homebrew though and re-add the llvm@10 package. This isn't a change in policy and I guess that it may suffice to add the new Arrow release to homebrew to get llvm@10 re-added. Uwe > > Neal > > On Mon, Oct 19, 2020 at 6:04 AM Uwe L. Korn wrote: > >

Re: [VOTE] Release Apache Arrow 2.0.0 - RC2

2020-10-19 Thread Uwe L. Korn
+0 from my side, I see no big issues. I was able to verify the wheels, the source verification fails due to the llvm package issues on brew; thus I'm not able to +1 this time. Uwe On Mon, Oct 19, 2020, at 7:38 PM, Krisztián Szűcs wrote: > On Mon, Oct 19, 2020 at 5:32 PM Uwe L. Kor

Re: [VOTE] Release Apache Arrow 2.0.0 - RC2

2020-10-21 Thread Uwe L. Korn
> > 1. [done] rebase master > > > >> > > > 2. [done] upload source > > > >> > > > 3. [done] upload binaries > > > >> > > > 4. [kszucs] update website > > > >> > > > 5. [done] upload

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-11-05-0

2020-11-05 Thread Uwe L. Korn
Taking care of the failing conda-win jobs in https://issues.apache.org/jira/browse/ARROW-10502 On Thu, Nov 5, 2020, at 11:14 AM, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-11-05-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-05-0 >

Re: Development with C++ and Cython APIs in Arrow

2020-11-06 Thread Uwe L. Korn
Hello Vibhatha, the best is to set a relative RPATH on the libraries. An example for this can be seen in the turbodbc sources: https://github.com/blue-yonder/turbodbc/blob/80a29a7edfbdabf12410af01c0c0ae74bfc3aab4/setup.py#L186-L189 Cheers Uwe On Tue, Nov 3, 2020, at 11:44 PM, Vibhatha Abeykoon

Re: Development with C++ and Cython APIs in Arrow

2020-11-06 Thread Uwe L. Korn
Vibhatha Abeykoon wrote: > > > Hello Uwe, > > > > Nice example. I will follow this. > > > > With Regards, > > Vibhatha Abeykoon > > > > > > On Fri, Nov 6, 2020 at 9:36 AM Uwe L. Korn wrote: > > > >> Hello Vibhatha, > &

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Uwe L. Korn
Hello Jorge, I know from the past on the Python/C++ side, we needed to do this for a lot of contributors to enable them to work with their branches/PRs again as they were overwhelmed with the complexity of these rebases. Personally, I wouldn't like to spend much time on whether we should rebase

Re: ursa-labs/crossbow on travis-ci.com is disabled

2020-11-26 Thread Uwe L. Korn
Also note that drone.io supports linux-arm64 which we use in conda-forge for this architecture and is already setup in crossbow (although we had issues with branches not being seen). On Thu, Nov 26, 2020, at 1:31 AM, Jeroen Ooms wrote: > On Wed, Nov 25, 2020 at 10:54 PM Sutou Kouhei wrote: > >

Re: Removing Python 3.5 support

2020-11-26 Thread Uwe L. Korn
+1 from my side too On Thu, Nov 26, 2020, at 1:04 PM, Joris Van den Bossche wrote: > +1 on dropping Python 3.5 > > On Thu, 26 Nov 2020 at 12:26, Antoine Pitrou wrote: > > > > > Hello, > > > > Python 3.5 is not supported upstream, neither by the CPython development > > team nor by third-party pr

Incompatability of all existing pyarrow releases with the next NumPy release

2020-12-04 Thread Uwe L. Korn
Hello all, Today the Karotothek CI turned quite red in https://github.com/JDASoftwareGroup/kartothek/pull/383 / https://github.com/JDASoftwareGroup/kartothek/pull/383/checks?check_run_id=1497941813 as the new NumPy 1.20rc1 was pulled in. It simply broke all pyarrow<->NumPy interop as now dtype

Re: Incompatability of all existing pyarrow releases with the next NumPy release

2020-12-04 Thread Uwe L. Korn
Still, the PR is so trival that we should merge it. I'm not uptodate what the status of the 2.0.1 release is but this would be an essential patch for that. On Fri, Dec 4, 2020, at 9:22 PM, Antoine Pitrou wrote: > > > Le 04/12/2020 à 21:11, Uwe L. Korn a écrit : > > Hell

Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-25 Thread Uwe L. Korn
+1 (binding) Verified C++, Python and Rust on the Apple M1 (natively!) and all works. I had to do some slight modifications to the verification script but they are independent of the source tarball: https://github.com/apache/arrow/pull/9315 Cheers Uwe On Fri, Jan 22, 2021, at 4:59 PM, Neal Ric

Re: [RESULT] [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-27 Thread Uwe L. Korn
1. [done] rebase master 2. [done] upload source 3. [done] upload binaries 4. [done] update website 5. [done] upload ruby gems 6. [done] upload js packages 8. [done] upload C# packages 9. [done] upload rust crates 10. [done] update conda recipes 11. [done] upload wheels/sdist to pypi 12. [ ]

Re: [C++] Private implementations and virtual interfaces

2019-07-27 Thread Uwe L. Korn
The PIMPL is a thing I would trade a bit of performance as it brings ABI stability. This is something that will help us making Arrow usage in thirdparty code much simpler. Simple updates when an API was only extended but the ABI is intact is a great ease on the Arrow consumer side. I know that

Re: [C++] Private implementations and virtual interfaces

2019-07-28 Thread Uwe L. Korn
ney a écrit : > > On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn wrote: > >> > >> The PIMPL is a thing I would trade a bit of performance as it brings ABI > >> stability. This is something that will help us making Arrow usage in > >> thirdparty code much simple

Re: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for Arrow 1.0.0 and beyond

2019-07-31 Thread Uwe L. Korn
+1 from me. I really like the separate versions Uwe On Tue, Jul 30, 2019, at 2:21 PM, Antoine Pitrou wrote: > > +1 from me. > > Regards > > Antoine. > > > > On Fri, 26 Jul 2019 14:33:30 -0500 > Wes McKinney wrote: > > hello, > > > > As discussed on the mailing list thread [1], Micah Korn

Re: Building on Arrow CUDA

2019-07-31 Thread Uwe L. Korn
Hello Paul, you might want to look into https://github.com/conda-forge/conda-forge.github.io/issues/687 where CUDA support on conda-forge is dicussed. I'm not uptodate anymore on this but reading the whole issue should give you the current level of support. Once this is solved, adding cuda sup

Re: Trouble building on Mac OS Mojave

2019-08-31 Thread Uwe L. Korn
Hello Chris, as a contributor, it is often simpler to use conda to construct a local development environment as outlined in https://arrow.apache.org/docs/developers/python.html#using-conda This is the typical environment most contributors work in. Even when not using conda as a package/environm

Re: Parquet to Arrow in Java

2019-09-04 Thread Uwe L. Korn
Hello, You may want to interact with the Apache Iceberg community here. They are currently a similar things: https://lists.apache.org/thread.html/3bb4f89a0b37f474cf67915f91326fa845afa597bdd2463c98a2c8b9@%3Cdev.iceberg.apache.org%3E I'm not involved in this, just reading both mailing lists and t

Re: [PROPOSAL] Consolidate Arrow's CI configuration

2019-09-05 Thread Uwe L. Korn
Hello Krisztián, I like this proposal. CI coverage and response time is a crucial thing for the health of the project. In general I like the consolidation and local reproducibility of tge builds. Some questions I wanted to ask to make sure I understand your proposal correctly (hopefully they a

Re: [PROPOSAL] Consolidate Arrow's CI configuration

2019-09-05 Thread Uwe L. Korn
Hello Krisztián, > Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs : > >> * The build configuration is automatically updated on a merge to master? >> > Not yet, but this can be automatized too with buildbot itself. This is something I would actually like to have before getting rid of the Travi

Re: [DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) approach

2019-09-17 Thread Uwe L. Korn
Hello, I'm actually against this proposal. My main concern is at the moment that Arrow C++/Python grows to a really heavy tool where you always have to bring along all baggage even when you're only using a small part of it. This is a problem which makes it harder to use Arrow in projects becau

Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds

2019-09-17 Thread Uwe L. Korn
Hello, I can think of two other alternatives that make it more visible what Arrow core is and what are the optional components: * Error out when no component is selected instead of building just the core Arrow. Here we could add an explanative message that list all components and for each comp

Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds

2019-09-18 Thread Uwe L. Korn
> > This is also a lot of work, but could also potentially benefit the > developer experience because we can make unit tests depend on individual > compilable units instead of all of libarrow. There are trade-offs here as > well in terms of public API coverage. > > On Tue, Sep

Re: Build issues on macOS [newbie]

2019-09-19 Thread Uwe L. Korn
Hello Tarek, this error message is normally the one you get when CONDA_BUILD_SYSROOT doesn't point to your 10.9 SDK. Please delete your build folder again and do `export CONDA_BUILD_SYSROOT=..` immediately before running cmake. Running e.g. a conda install will sadly reset this variable to some

Re: [DISCUSS] C-level in-process array protocol

2019-09-19 Thread Uwe L. Korn
Hello, I like this proposal as it will make interfacing inside a process between various Arrow supports much easier. I'm a bit critical though of using a string as the format representation as one needs to parse it correctly. Couldn't we use the enums we already have and reimplement them as C-d

Collecting Arrow critique and our roadmap on that

2019-09-19 Thread Uwe L. Korn
Hello, there has been a lot of public discussions lately with some mentions of actually informed, valid critique of things in the Arrow project. From my perspective, these things include "there is not STL-native C++ Arrow API", "the base build requires too much dependencies", "the pyarrow packa

Re: Collecting Arrow critique and our roadmap on that

2019-09-23 Thread Uwe L. Korn
ted to a roadmap on the confluence wiki that > > should be folded in as appropriate too. > > > > Neal > > > > On Thu, Sep 19, 2019 at 10:26 AM Uwe L. Korn wrote: > > > > > > Hello, > > > > > > there has been a lot of public discussio

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-08 Thread Uwe L. Korn
I'm not sure what qualifies for "board attention" but it seems that CI is a critical problem in Apache projects, not just Arrow. Should we raise that? Uwe On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote: > Here is a start for our Q3 board report > > ## Description: > The mission of Apache

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Uwe L. Korn
I'm not sure whether flatbuffers is actually an issue in the end but keeping it out of the C-API definitely simplifies it a bit adoption-wise. I don't think that though that using protobuf would make a difference here. In general, I really like the C-interface work as sadly C-APIs are still the

Re: [DISCUSS] Reviewing Arrow commit/code review policy

2019-10-14 Thread Uwe L. Korn
Hello all, I also think we should stay with CTR for the moment. If we wanted to enforce RTC or at least a bit better notification for reviewers of certain parts of Arrow, we could setup a CODEOWNERS file[1] to add experts of a certain file/folder as a reviewer on PRs on Github. Cheers Uwe [1]

Re: Adding stronger warnings about pre-production Arrow IPC implementations (C#, Rust)

2019-11-22 Thread Uwe L. Korn
Hello Wes, what about adding an implementation status (table) to the README of every language? Things like "Supports Arrow File Format", "Supports Arrow Stream Format", "Passes IPC integration tests", "Supports Flight" are things that are interesting to users and show how far an implementation

[Python] Exposing compute kernels

2019-12-17 Thread Uwe L. Korn
Hello all, we have developed quite some compute kernels in C++ nowadays and I would like to call them from Python. We could expose the kernels on the Array/ChunkedArray classes themselves or as standalone functions (or as both). What would be the preferred way? Also exposing them as standalone

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-05 Thread Uwe L. Korn
I'm failing to verify C++ on macOS as it seems that we nowadays pull all dependencies from the system. Is there a known way to build & test on OSX with the script and use conda for the requirements? Otherwise I probably need to investe to create such a way. Cheers Uwe On Wed, Feb 5, 2020, at

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Uwe L. Korn
I'm also very much in favor of this. For the black / cython support, I think the current state is reflected in https://github.com/pablogsal/black/tree/cython. On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote: > +1 from me as well. > > On Thursday, March 26, 2020, Neal Richardson > wrote

Re: [VOTE] Release Apache Arrow 0.11.0 (RC1)

2018-10-04 Thread Uwe L. Korn
for > > fixing this .asc problem? > > > > > > Thanks, > > -- > > kou > > > > In <1538639225.4190225.1530323248.542da...@webmail.messagingengine.com> > > "Re: [VOTE] Release Apache Arrow 0.11.0 (RC1)" on Thu, 04 Oct 2018 >

Re: Petastorm: PyArrow based library for Tensorflow, PyTorch and others...

2018-10-05 Thread Uwe L. Korn
Hello Yevgeni, this looks interesting. Can you make a PR to https://github.com/apache/arrow so that Petastorm is listed on https://arrow.apache.org/powered_by/ ? I browsed a bit through your code. As far as I can see your approach is store to have a set of Parquet files in a directory with a

Re: [JIRA] -ARROW-1780 - JDBC Adapter - resolved.

2018-10-05 Thread Uwe L. Korn
Hello Atul, sorry for the long turnaround time. I finally had the time to spin up the code from Python. I simply did some tests with a table of New York Taxi trip data and Apache Drill. Using the bundled JDBC driver and JayDeBeAPI, the default for accessing JDBC from Python, it took 11 minutes

Re: [VOTE] Release Apache Arrow 0.11.0 (RC1)

2018-10-08 Thread Uwe L. Korn
+1 (binding) I'm quite uncomfortable with the number of breakages but I think that at this size of the project it will be unevitable that we will have still some minor problems in the release. On Mon, Oct 8, 2018, at 6:39 AM, Bryan Cutler wrote: > +1 (non-binding) > > I ran tests for C++, Pyth

Re: parquet-column_scanner-test failure

2018-10-11 Thread Uwe L. Korn
Hello Tanveer, your attachment did not come through as attachments are not allowed on the mailing list. Can you post it somewhere? Uwe On Thu, Oct 11, 2018, at 12:33 PM, Tanveer Ahmad - EWI wrote: > Hi, > > I enabled following flags and got error in the attachment (parquet- > column_scanner-t

Re: parquet-column_scanner-test failure

2018-10-11 Thread Uwe L. Korn
ering > EEMCS, TU Delft, The Netherlands > ____________ > From: Uwe L. Korn [uw...@xhochy.com] > Sent: Thursday, October 11, 2018 2:43 PM > To: dev@arrow.apache.org > Subject: Re: parquet-column_scanner-test failure > > Hello Tanveer, > > your attachment did

Re: [DRAFT] Apache Arrow board report October 2018

2018-10-11 Thread Uwe L. Korn
You could also mention that we are about to receive a C# donation. Otherwise this looks good. Uwe On Thu, Oct 11, 2018, at 6:05 PM, Wes McKinney wrote: > ## Description: > > Apache Arrow is a cross-language development platform for in-memory data. It > specifies a standardized language-independ

Re: [VOTE] Accept donation of Arrow C# .NET implementation

2018-10-15 Thread Uwe L. Korn
+1 On Mon, Oct 15, 2018, at 5:27 PM, Wes McKinney wrote: > hi folks, > > Individuals from Feyen Zylstra LLC have developed a C# implementation > of Apache Arrow and are proposing to donate it to the Apache project, > as discussed on the mailing list > > https://github.com/feyenzylstra/apache-arr

Re: [VOTE] Accept donation of Ruby bindings to Parquet GLib

2018-10-18 Thread Uwe L. Korn
+1 > Am 18.10.2018 um 22:59 schrieb Wes McKinney : > > hello, > > Kouhei Sutou is proposing to donate Ruby bindings to the Parquet GLib > library, which was received as a donation in September. This Ruby > library was originally developed at > > https://github.com/red-data-tools/red-parquet/ >

Re: Making a bugfix 0.11.1 release

2018-10-20 Thread Uwe L. Korn
I have triggered the wheel builds on my crossbow repo with build-25, feel free to use them. Uwe On Sat, Oct 20, 2018, at 3:52 PM, Wes McKinney wrote: > I'm having problems with Crossbow. I am going to try a few things > (going through the setup process "from scratch" -- new tokens, new > local r

Re: [VOTE] Release Apache Arrow 0.11.1 (RC0)

2018-10-21 Thread Uwe L. Korn
+1 (binding) Run verification script on OSX, had the same Plasma failures in Python as in the 0.11 vote and thus not considering them as critical. On Sun, Oct 21, 2018, at 11:15 PM, Krisztián Szűcs wrote: > I can't run the verification script right now, but I've followed the > changes, and it's

Re: [RESULT] [VOTE] Release Apache Arrow 0.11.1 (RC0)

2018-10-23 Thread Uwe L. Korn
I'll take care of > * Upload the new wheels to PyPI > * Update the conda packages Uwe

Re: Encoding options (delta, rle, ...) in pyarrow bindings

2018-11-02 Thread Uwe L. Korn
Hello Sebastian, currently you can only switch between plain and dictionary-encoding-combined-with-run-length encoding using the `use_dictionary` flag on https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table . Other encoding are yet only im

Re: Encoding options (delta, rle, ...) in pyarrow bindings

2018-11-02 Thread Uwe L. Korn
r Delta encoding in the Arrow columnar format. I suspect > > this will eventually be added as it can be quite important to improve > > in-memory query execution performance. > > > > Wes > > > > On Fri, Nov 2, 2018, 2:18 PM Uwe L. Korn > > > > Hello S

Re: Help by following "parquet" and "pyarrow" tags on StackOverflow

2018-11-06 Thread Uwe L. Korn
We also have an `apache-arrow` tag on StackOverflow. I was only follwoing this and not pyarrow. Note that you can setup email notifications for these tags at https://stackexchange.com/filters Cheers Uwe On Tue, Nov 6, 2018, at 10:06 AM, Wes McKinney wrote: > hi folks, > > We are getting a lot

Re: Creating Buffer directly from pointer/length

2018-11-08 Thread Uwe L. Korn
Hello Randy, you are looking for https://arrow.apache.org/docs/python/generated/pyarrow.foreign_buffer.html#pyarrow.foreign_buffer This takes an address, size and a Python object for having a reference on the object. In your case the last one can be None. Note that this will not do a copy and

Re: [ANNOUNCE] New Arrow committers: Romain François, Sebastien Binet, Yosuke Shiro

2018-11-08 Thread Uwe L. Korn
Welcome to all of you! On Thu, Nov 8, 2018, at 8:56 PM, Wes McKinney wrote: > On behalf of the Arrow PMC, I'm happy to announce that Romain > François, Sebastien Binet, and Yosuke Shiro have been invited to be > committers on the project. > > Welcome, and thanks for your contributions!

Re: [ANNOUNCE] New Arrow PMC member: Krisztián Szűcs

2018-11-08 Thread Uwe L. Korn
Congratulations Krisztián! On Thu, Nov 8, 2018, at 9:56 PM, Philipp Moritz wrote: > Congrats and welcome Krisztián! > > On Thu, Nov 8, 2018 at 11:48 AM Wes McKinney wrote: > > > The Project Management Committee (PMC) for Apache Arrow has invited > > Krisztián Szűcs to become a PMC member and we

Re: [VOTE] Accept donation of Rust Parquet implementation

2018-12-01 Thread Uwe L. Korn
+1, nice to see this joining the Apache community Uwe > Am 01.12.2018 um 10:16 schrieb Antoine Pitrou : > > >> Le 01/12/2018 à 00:50, Wes McKinney a écrit : >> >> This vote is to determine if the Arrow PMC is in favor of accepting >> this donation. If the vote passes, the PMC and the authors

Re: Reviewing PRs (was: Re: Arrow sync call)

2018-12-19 Thread Uwe L. Korn
+1, I would also like to see them in Sphinx. Uwe > Am 19.12.2018 um 11:13 schrieb Antoine Pitrou : > > > We should decide where we want to put developer docs. > > I would favour putting them in the Sphinx docs, personally. > > Regards > > Antoine. > > >> Le 19/12/2018 à 02:20, Wes McKinne

Re: How to append to parquet file periodically and read intermediate data - pyarrow.lib.ArrowIOError: Invalid parquet file. Corrupt footer.

2018-12-19 Thread Uwe L. Korn
Hello Darren, you're out of luck here. Parquet files are immutable and meant for batch writes. Once they're written you cannot modify them anymore. To load them, you need to know their metadata which is in the footer. The footer is always at the end of the file and written once you call close.

Re: How to append to parquet file periodically and read intermediate data - pyarrow.lib.ArrowIOError: Invalid parquet file. Corrupt footer.

2018-12-19 Thread Uwe L. Korn
t; >>> what Uwe suggests is usually the way to go, your active process writes to a >>> new file every time. Then you have a parallel process/thread that does >>> compaction of smaller files in the background such that you don't have too >>> many files. >>&g

Re: C++ documentation overhaul

2018-12-27 Thread Uwe L. Korn
I also see this problem. This is due to the underlying filesystem on macOS being case insensitive. The fix is to make your file system case sensitive (this is possible but takes a while) We have two generated files pyarrow.array.rst and pyarrow.Array.rst. For me the latter is the one that relia

Move arrow-site.git to gitbox

2019-01-03 Thread Uwe L. Korn
Hello, as requested per the mail from ASF infra, I would like to move the arrow-site git repo to gitbox. This is the repo used for the distribution of the rendered version of the website. Some +1s or a point why we should consider alternatives would help to bring this forward. If there is conse

Re: Enabling an installation path for Arrow R users

2019-01-03 Thread Uwe L. Korn
We probably need to support both, conda-forge and CRAN. As a first shot, conda-forge will be much easier to setup as we should have a better build toolchain available there and this could also then be used in the multilanguage scenario demos really well. From my experience, the usage of conda in

Re: Building arrow using Xcode on Mac OS

2019-01-04 Thread Uwe L. Korn
Hello Hatem, I don't know of anyone that has used Xcode to build Arrow yet. We're normally using `-GNinja` or the default make generator to build it. As I have a Mac, I'll have a look at this but "cmake -G Xcode" is not running for me at the moment. To help us debug this, can you open a JIRA on

Re: Building arrow using Xcode on Mac OS

2019-01-04 Thread Uwe L. Korn
Hello Hatem, I don't know of anyone that has used Xcode to build Arrow yet. We're normally using `-GNinja` or the default make generator to build it. As I have a Mac, I'll have a look at this but "cmake -G Xcode" is not running for me at the moment. To help us debug this, can you open a JIRA on

Re: Arrow Rust roadmapping [was Re: [Gandiva] Representing logical query plans in protobuf]

2019-01-06 Thread Uwe L. Korn
Hello Andy, one thing that we had in discussions in the past and also opened me up a bit to the parquet-cpp merge is that merging code into a repo doesn't mean that it will reside always there. Apache has the infrastructure and guidelines to split a part of a project into a separate one. This i

Re: [Rust] crate versions and release process

2019-01-06 Thread Uwe L. Korn
This is definitely possible for Apache projects. Currently we still have two releases: "Arrow without JS" and "Arrow JS". We can have separate release votes for security and small fixes for subcrates. There are mainly two things that "limit us": 1. We still need to do the release votes (there a

Re: RecordBatchFile with no batches, Error: Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

2019-01-09 Thread Uwe L. Korn
Hello Ryan, for CentOS and pip, I would recommend to use the docker scripts that we use to build the manylinux1 compatible wheels (the ones we also upload to PyPI): https://github.com/apache/arrow/tree/master/python/manylinux1 They will bootstrap an isolated environment in docker that is indepe

Re: Compiling Arrow for RaspberryPi

2019-01-09 Thread Uwe L. Korn
Hello Suvayu, for arrow-cpp it is definitely possible to cross-compile on the desktop as it using standard CMake for the build. There are a lot of guides available for doing cross compilation with CMake. This may work but I would expect that in some places we're probably not passing all flags t

  1   2   3   4   5   6   7   8   9   10   >