Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread David Li
;> > >> > in this case. >> > >> > Here is an use case I think: >> > >> > A system has time series data. Each node in the system has >> > data for one day. If a client requests "SELECT * FROM data >> > WHERE server = 'server

Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread Weston Pace
t; > > Here is an use case I think: > > > > A system has time series data. Each node in the system has > > data for one day. If a client requests "SELECT * FROM data > > WHERE server = 'server1' ORDER BY created_at DESC", the > > system returns the following

Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread David Li
day. If a client requests "SELECT * FROM data > WHERE server = 'server1' ORDER BY created_at DESC", the > system returns the followings: > > Endpoint 20230428: (DATA_FOR_2023_04_28) > Endpoint 20230427: (DATA_FOR_2023_04_27) > Endpoint 20230426: (DATA_FOR_2023_04_26) >

[RESULT][VOTE][RUST][DataFusion] Release DataFusion Python Bindings 23.0.0 RC2

2023-04-27 Thread Andy Grove
Resending with RESULT subject line On Thu, Apr 27, 2023 at 7:33 PM Andy Grove wrote: > The vote passes with three binding +1 votes. Thanks, everyone. > > Source release: > > > https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-python-23.0.0/ > > Wheels: > >

Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 23.0.0 RC2

2023-04-27 Thread Andy Grove
The vote passes with three binding +1 votes. Thanks, everyone. Source release: https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-python-23.0.0/ Wheels: https://pypi.org/project/datafusion/ Thanks, Andy. On Thu, Apr 27, 2023 at 9:18 AM Andrew Lamb wrote: > +1 (binding)

Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread Sutou Kouhei
se case I think: A system has time series data. Each node in the system has data for one day. If a client requests "SELECT * FROM data WHERE server = 'server1' ORDER BY created_at DESC", the system returns the followings: Endpoint 20230428: (DATA_FOR_2023_04_28) Endpoint 20230427: (DATA

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Sutou Kouhei
Hi, Thanks for sharing the log. libcrypto.so isn't related on the segmentation fault. It's just for relating to showing backtrace. > perl: error while loading shared libraries: libcrypt.so.1: > cannot open shared object file: No such file or directory This is happen at

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Sutou Kouhei
Hi, Could you share the full log of the errors? Thanks, -- kou In "Re: [VOTE] Release Apache Arrow 12.0.0 - RC0" on Thu, 27 Apr 2023 14:31:55 -0700, Will Jones wrote: > Hi Raul, > > It might be worth creating a new RC that fixes more of the test issues, > even if they shouldn't be

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Sutou Kouhei
Hi, I don't think https://github.com/apache/arrow/issues/35321 is a blocker for 12.0.0 RC0 because our fix https://github.com/apache/arrow/pull/35324 just skips these tests only for pandas 2.0.1. If it's a blocker of verification, can we skip these tests by adding some pytest arguments in our

Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread Weston Pace
So this would be a case where multiple "endpoints" are acting as a single "stream of batches"? Or am I misunderstanding? What're some scenarios where that would be done? When would it be preferred for the client to merge the endpoints instead of the client's user? On Thu, Apr 27, 2023, 3:22 PM

Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread David Li
The server would have to report these as multiple endpoints in all your examples. (There's nothing saying a particular location can only appear once, or that "Endpoint 2" has to come after "Endpoint 1" for the DESC example.) The flag tells the client if it can fetch data in parallel without

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Jacob Wujciak
I have uploaded the log [1] for the run using conda with gandiva active. It looks like there is an issue with libcrypt.so causing these tests to segfault. 1: https://gist.github.com/assignUser/cba0a13875de9d6a4f31000f585244f0 On Thu, Apr 27, 2023 at 11:32 PM Will Jones wrote: > Hi Raul, > > It

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Will Jones
Hi Raul, It might be worth creating a new RC that fixes more of the test issues, even if they shouldn't be blockers. I've run the release script a few different times, and after 1.5 hours (is that a normal runtime for verification?) I get various test failures. So far the errors are in the

Re: Arrow community meeting April 26 at 16:00 UTC

2023-04-27 Thread Ian Cook
Below is a summary of the notes from yesterday's meeting: Attendees: - Ian Cook - Raúl Cumplido - Xuwei Fu - Will Jones - Bryce Mecum - Rok Mihevc - Sri Nadukudy - Matthew Topol Discussion: Arrow 12.0.0 release - RC0 has been proposed [1] - There were a lot of CI failures at the time of the

Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 23.0.0 RC2

2023-04-27 Thread Andrew Lamb
+1 (binding) x86_64 mac Andrew On Tue, Apr 25, 2023 at 2:52 AM L. C. Hsieh wrote: > +1 (binding) > > Verified on M1 Mac. > > Thanks Andy. > > On Mon, Apr 24, 2023 at 6:13 PM Andy Grove wrote: > > > > Hi, > > > > I would like to propose a release of Apache Arrow DataFusion Python > > Bindings,

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Raúl Cumplido
Hi, The vote for the RC has been open for 5 days. I will wait until tomorrow, if no more +1 votes are casted I understand that the issue related to the pandas failures (https://github.com/apache/arrow/issues/35321) is causing verification to fail and we require a new RC with the above fix. Let

Re: [DISCUSS][Format][Flight] Ordered data support

2023-04-27 Thread Andrew Lamb
I wonder if we have considered simply removing the statement "There is no ordering defined on endpoints. Hence, if the returned data has an ordering, it should be returned in a single endpoint." and replacing it with something that says "the relative ordering of data from different endpoints is

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-04-27 Thread Andrew Lamb
My apologies, I did not see the thread [1] for some reason [1] https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb On Thu, Apr 27, 2023 at 10:32 AM Andrew Lamb wrote: > Felipe, thank you for bringing this up. > > Another approach that is sometimes used in database engines (like

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-04-27 Thread Andrew Lamb
Felipe, thank you for bringing this up. Another approach that is sometimes used in database engines (like DuckDB) and is often called selection vectors, is to store another bitmask that says which elements in the array should be "selected" and which are ignored and functions like a view. For

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Sutou Kouhei
Hi, I tried this on a manjarolinux/base Docker image. I think that this is a problem of the Arch Linux's llvm package. LLVMExports.cmake in the package doesn't provide the LLVMX86CodeGen target: # grep add_library /usr/lib/cmake/llvm/LLVMExports.cmake add_library(LLVMDemangle STATIC IMPORTED)

Re: [VOTE] Release Apache Arrow 12.0.0 - RC0

2023-04-27 Thread Raúl Cumplido
Hi Jacob, Could you share the log of the error to try and understand what is causing the test failures? El jue, 27 abr 2023, 0:29, Jacob Wujciak escribió: > I was able to build with USE_CONDA=1 but that produced a long list of test > failures, which seems concerning? > 17 -

[RESULT][VOTE] Formalize how to change format

2023-04-27 Thread Sutou Kouhei
Hi, The vote carries with 6 +1 binding votes, 1 +1 non-binding vote and no -1 votes. I'll merge https://github.com/apache/arrow/pull/35174 . Thanks, -- kou In <20230424.103259.664806138128874521@clear-code.com> "[VOTE] Formalize how to change format" on Mon, 24 Apr 2023 10:32:59 +0900