Hello Arrow devs,
Just a quick note. To answer one of my earlier questions:
1. Is this array type currently only used in Velox? (not DuckDB like some
> of the other new types?) What evidence do we have that it will become used
> outside of Velox?
>
This type is also used by DuckDB. Found
(Admittedly, PR title of [1] doesn't reflect that only the scalar aggregate
UDF is implemented and not the hash one - that is an oversight on my part -
sorry)
On Tue, Jun 13, 2023 at 3:51 PM Li Jin wrote:
> Thanks Weston.
>
> I think I found what you pointed out to me before which is this bit
Thanks Weston.
I think I found what you pointed out to me before which is this bit of code:
https://github.com/apache/arrow/blob/main/cpp/src/arrow/dataset/partition.cc#L118
I will try if I can adapt this to be used in streaming situation.
> I know you recently added [1] and I'm maybe a little
Are you looking for something in C++ or python? We have a thing called the
"grouper" (arrow::compute::Grouper in arrow/compute/row/grouper.h) which
(if memory serves) is the heart of the functionality in C++. It would be
nice to add some python bindings for this functionality as this ask comes
Welcome Jie!
On Tue, Jun 13, 2023, at 10:25, Weston Pace wrote:
> Congratulations
>
> On Tue, Jun 13, 2023, 1:28 AM Joris Van den Bossche <
> jorisvandenboss...@gmail.com> wrote:
>
>> Congratulations!
>>
>> On Mon, 12 Jun 2023 at 22:00, Raúl Cumplido
>> wrote:
>> >
>> > Congratulations Jie!!!
>>
Hi,
I am trying to write a function that takes a stream of record batches
(where the last column is group id), and produces k record batches, where
record batches k_i contain all the rows with group id == i.
Pseudocode is sth like:
def group_rows(batches, k) -> array[RecordBatch] {
Gotcha - If there is no penalty from RecordBatch<->StructArray then I am
happy with the current approach - thanks!
For Spencer's question, the reason that I use StructArray is because the
kernel interfaces I am interested in uses Array interface instead of
RecordBatch, so StructArray is easier
Congratulations
On Tue, Jun 13, 2023, 1:28 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
> Congratulations!
>
> On Mon, 12 Jun 2023 at 22:00, Raúl Cumplido
> wrote:
> >
> > Congratulations Jie!!!
> >
> > El lun, 12 jun 2023, 20:35, Matt Topol
> escribió:
> >
> > > Congrats
Hi,
I've had an issue with the post-11-bump-versions.sh script. For patch
releases the script fails unless using the `BUMP_DEB_PACKAGE_NAMES=0`
flag.
This is not documented and I had to test several retries locally to
understand what the issue was.
The problem is that this script commits and
Thanks Nic for helping me with uploading sources and adding the
release to the Apache Reporter System.
This is the current status of the post-release tasks:
- [done] Update the released milestone Date and set to "Closed" on GitHub
- [done] Merge changes on release branch to maintenance branch
Congratulations!
On Mon, 12 Jun 2023 at 22:00, Raúl Cumplido wrote:
>
> Congratulations Jie!!!
>
> El lun, 12 jun 2023, 20:35, Matt Topol escribió:
>
> > Congrats Jie!
> >
> > On Sun, Jun 11, 2023 at 9:20 AM Andrew Lamb wrote:
> >
> > > The Project Management Committee (PMC) for Apache Arrow
On Mon, 12 Jun 2023 at 21:30, Jerald Alex wrote:
>
> hi Weston,
>
> Thank you so much for taking the time to respond. Really appreciate it.
>
> I'm using parquet files. So would it be possible to elaborate the below.? I
> cannot seem to find any documentation for ParquetFileFragment.
>
> "there
I think your original code roundtripping through RecordBatch
(`pa.RecordBatch.from_pandas(df).to_struct_array()`) is the best
option at the moment. The RecordBatch<->StructArray part is a cheap
(zero-copy) conversion, and by using RecordBatch.from_pandas, you can
rely on all pandas<->arrow
Hi,
Thanks everyone.
The result of the vote is successful with 3 +1 binding votes, 3 +1
non-binding vote and no -1 votes.
I will start the post release tasks for 12.0.1 [1].
Thanks,
Raúl
[1] https://arrow.apache.org/docs/dev/developers/release.html#post-release-tasks
El mar, 13 jun 2023 a las
14 matches
Mail list logo