I think "inheritance" and "composition" are more concerns for
implementations than they are for spec (I could be wrong here).
So it seems that it would be sufficient to write the HLLSKETCH's canonical
definition as "this is an extension of the JSON logical type and supports
all the same storage
+1 (binding)
On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc wrote:
> Thanks for all the reviews and comments! I've included the big-endian
> requirement so the proposed language is now as below.
> I'll leave the vote open until after the May holiday.
>
> Rok
>
> UUID
>
>
> * Extension name:
+1 (binding)
I agree we should be explicit about RFC-8259
On Mon, Apr 29, 2024 at 4:46 PM David Li wrote:
> +1 (binding)
>
> assuming we explicitly state RFC-8259
>
> On Tue, Apr 30, 2024, at 08:02, Matt Topol wrote:
> > +1 (binding)
> >
> > On Mon, Apr 29, 2024 at 5:36 PM Ian Cook wrote:
> >
ld
> > > it error, or should it create a table with a single column?
> >
> > Presumably it should just error? I can see this being ambiguous if there
> > were an API that dynamically returned either a table or a column based on
> > the input shape (where befo
> *As per Apache Parquet Community Parquet V2 is not final yet so it is not
> official . They are advising not to use Parquet V2 for writing (though
code
> is available ) .*
This would be news to me. Parquet releases are listed (by the parquet
community) at [1]
The vote to release parquet 2.10
I tend to agree with Dewey. Using run-end-encoding to represent a scalar
is clever and would keep the c data interface more compact. Also, a struct
array is a superset of a record batch (assuming the metadata is kept in the
schema). Consumers should always be able to deserialize into a struct
> people generally find use in Arrow schemas independently of concrete data.
This makes sense. I think we do want to encourage use of Arrow as a "type
system" even if there is no data involved. And, given that we cannot
easily change a field's data type property to "optional" it makes sense to
> may want an Other type to signal that it would fail if asked to provide
particular columns.
I interpret "would fail" to mean we are still speaking in some kind of
"planning stage" and not yet actually creating arrays. So I don't know
that this needs to be a data type. In other words, I see
Congratulations!
On Thu, Apr 11, 2024 at 9:12 AM wish maple wrote:
> Congrats!
>
> Best,
> Xuwei Fu
>
> Kevin Gurney 于2024年4月11日周四 23:22写道:
>
> > Congratulations, Sarah!! Well deserved!
> >
> > From: Jacob Wujciak
> > Sent: Thursday, April 11, 2024 11:14 AM
>
> Probably major versions should match between C++ and PyArrow, but I guess
> we could have diverging minor and patch versions. Or at least patch
> versions given that
> a new minor version is usually cut for bug fixes too.
I believe even this would be difficult. Stable ABIs are very finicky in
Forgot link:
[1]
https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory
On Tue, Apr 2, 2024 at 11:38 AM Weston Pace wrote:
> Thanks for taking the time to address my concerns.
>
> > I've split the S3/HTTP URI flight pieces out into a separate document and
than a markdown PR for the Arrow documentation as I could
> > > more visually express things without a preview of the rendered
> markdown.
> > If
> > > it would get people to be more likely to vote on this, I can write up
> the
> > > documentation markd
Wouldn't support for ADT require expressing more than 1 type id per
record? In other words, if `put` has type id 1, `delete` has type id 2,
and `erase` has type id 3 then there is no way to express something is (for
example) both type id 1 and type id 3 because you can only have one type id
per
Congratulations Joel!
On Mon, Apr 1, 2024 at 1:16 PM Bryce Mecum wrote:
> Congrats, Joel!
>
> On Mon, Apr 1, 2024 at 6:59 AM Matt Topol wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Joel Lubinitsky
> has
> > accepted an invitation to become a committer on Apache Arrow.
Thank you for bringing this up. I'm in favor of this. I think there are
several motivations but the main ones are:
1. Decoupling the versions will allow components to have no release, or
only a minor release, when there are no breaking changes
2. We do have some vote fatigue I think and we
I'm sorry for the very late reply. Until yesterday I had no real concept
of what this was talking about and so I had stayed out.
I'm +0 only because it isn't clear what we are voting on. There is a word
doc with no implementation or PR. I think there could be an implementation
/ PR. For
> I don't think there is currently a direct equivalent to
> `FlightRecordBatchStream` in the arrow javascript library, but you should
> be able to combine the data header + body and then read it using the
> `fromIPC` functions since it's just the Arrow IPC format
The RecordBatchReader[1] _should_
Congratulations!
On Sun, Mar 17, 2024, 8:01 PM Jacob Wujciak wrote:
> Congrats, well deserved!
>
> Nic Crane schrieb am Mo., 18. März 2024, 03:24:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has
> > accepted an invitation to become a committer on Apache Arrow.
Felipe's points are good.
I don't know that you need to adapt the entire ADBC, it sort of depends
what you're after. I see what you've got right now as more of an SQL
abstraction layer. For example, similar to things like [1][2][3] (though 3
is more of an ORM). If you like the SQL interface
+1 (binding)
On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb wrote:
> Hello,
>
> As we have discussed[1][2] I would like to vote on the proposal to
> create a new Apache Top Level Project for DataFusion. The text of the
> proposed resolution and background document is copy/pasted below
>
> If the
Congrats!
On Fri, Feb 16, 2024 at 3:07 AM Raúl Cumplido wrote:
> Congratulations!!
>
> El vie, 16 feb 2024 a las 12:02, Daniël Heres
> () escribió:
> >
> > Congratulations!
> >
> > On Fri, Feb 16, 2024, 11:33 Metehan Yıldırım <
> metehan.yildi...@synnada.ai>
> > wrote:
> >
> > > Congrats!
> > >
+1. There have been a few times I've attempted to run the verification
scripts. They have failed, but I was pretty confident it was a problem
with my environment mixing with the verification script and not a problem
in the software itself and I didn't take the time to debug the verification
I agree engines can use their own strategy. Requiring explicit casts is
probably ok as long as it is well documented but I think I lean slightly
towards implicitly falling back to the storage type. I do think think
people still shy away from extension types. Adding the extension type to
an
t least 72 hours.
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1 Keep Flight SQL experimental because...
> > >
> > > On Fri, Dec 8, 2023, at 13:37, Weston Pace wrote:
> > >> +1
> > >>
> > >> On Fri,
+1
On Fri, Dec 8, 2023 at 10:33 AM Micah Kornfield
wrote:
> +1
>
> On Fri, Dec 8, 2023 at 10:29 AM Andrew Lamb wrote:
>
> > I agree it is time to "promote" ArrowFlightSQL to the same level as other
> > standards in Arrow
> >
> > Now that it is used widely (we use and count on it too at
Congratulations Felipe!
On Thu, Dec 7, 2023 at 8:38 AM wish maple wrote:
> Congrats Felipe!!!
>
> Best,
> Xuwei Fu
>
> Benjamin Kietzman 于2023年12月7日周四 23:42写道:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Felipe Oliveira
> > Carvalho
> > has accepted an invitation to become a
Congrats Andy!
On Mon, Nov 27, 2023, 7:31 PM wish maple wrote:
> Congrats Andy!
>
> Best,
> Xuwei Fu
>
> Andrew Lamb 于2023年11月27日周一 20:47写道:
>
> > I am pleased to announce that the Arrow Project has a new PMC chair and
> VP
> > as per our tradition of rotating the chair once a year. I have
Congratulations James
On Fri, Nov 17, 2023 at 6:07 AM Metehan Yıldırım <
metehan.yildi...@synnada.ai> wrote:
> Congratulations!
>
> On Thu, Nov 16, 2023 at 10:45 AM Sutou Kouhei wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that James Duong
> > has accepted an invitation to
Congratulations Raúl!
On Mon, Nov 13, 2023 at 1:34 PM Ben Harkins
wrote:
> Congrats, Raúl!!
>
> On Mon, Nov 13, 2023 at 4:30 PM Bryce Mecum wrote:
>
> > Congrats, Raúl!
> >
> > On Mon, Nov 13, 2023 at 10:28 AM Andrew Lamb
> > wrote:
> > >
> > > The Project Management Committee (PMC) for
+1 for the original proposal as well.
---
The (minor) problem I see with flags is that there isn't much point to this
feature if you are gating on a flag. I'm assuming the goal is what Dewey
originally mentioned which is making buffer calculations easier. However,
if you're gating the feature
Is this buffer lengths buffer only present if the array type is Utf8View?
Or are you suggesting that other types might want to adopt this as well?
On Thu, Oct 26, 2023 at 10:00 AM Dewey Dunnington
wrote:
> > I expect C code to not be much longer then this :-)
>
> nanoarrow's
Congratulations Xuwei!
On Mon, Oct 23, 2023 at 3:38 AM wish maple wrote:
> Thanks kou and every nice person in arrow community!
>
> I've learned a lot during learning and contribution to arrow and
> parquet. Thanks for everyone's help.
> Hope we can bring more fancy features in the future!
>
>
> Of course, what I'm really asking for is to see how Lance would compare
;-)
> P.S. The second paper [2] also talks about ML workloads (in Section 5.8)
> and GPU performance (in Section 5.9). It also cites Lance as one of the
> future formats (in Section 5.6.2).
Disclaimer: I work for LanceDb
Congratulations Jon!
On Sun, Oct 15, 2023, 1:51 PM Neal Richardson
wrote:
> Congratulations!
>
> On Sun, Oct 15, 2023 at 1:35 PM Bryce Mecum wrote:
>
> > Congratulations, Jon!
> >
> > On Sat, Oct 14, 2023 at 9:24 AM Andrew Lamb
> wrote:
> > >
> > > The Project Management Committee (PMC) for
Congratulations!
On Sun, Oct 15, 2023, 8:51 AM Gang Wu wrote:
> Congrats!
>
> On Sun, Oct 15, 2023 at 10:49 PM David Li wrote:
>
> > Congrats & welcome Curt!
> >
> > On Sun, Oct 15, 2023, at 09:03, wish maple wrote:
> > > Congratulations!
> > >
> > > Raúl Cumplido 于2023年10月15日周日 20:48写道:
> >
> I feel the broader question here is what is Arrow's intended use case -
interchange or execution
The line between interchange and execution is not always clear. For
example, I think we would like Arrow to be considered as a standard for UDF
libraries.
On Fri, Oct 6, 2023 at 7:34 AM Mark
In other languages I have seen this called "async local"[1][2][3]. I'm not
sure of any C++ implementations. Folly's fibers claim to have fiber-local
variables[4] but I can't find the actual code to use them. I can't seem to
find reference to the concept in boost's asio or cppcoro.
I've also
+1
Thanks to all for the discussion and thanks to Ben for all of the great
work.
On Mon, Aug 21, 2023 at 9:16 AM wish maple wrote:
> +1 (non-binding)
>
> It would help a lot when processing UTF-8 related data!
>
> Xuwei
>
> Andrew Lamb 于2023年8月22日周二 00:11写道:
>
> > +1
> >
> > This is a great
> But I can't figure out how to express "select struct field 0 from field 2
> of the original table where field 2 is a struct column"
>
> Any idea how the substrait message should look like for the above?
I believe it would be:
```
"expression": {
"selection": {
"direct_reference": {
> I would welcome a draft PR showcasing the changes necessary in the IPC
> format definition, and in the C Data Interface specification (no need to
> actually implement them for now :-)).
I've proposed something at [1].
> One sketch of an idea: define sets of types that we can call “kinds”**
>
8/16, The system works fine. CPU is about
> > 100%. like 2.1.1
> > 2.2.2 for bucket_size to 32, the bug comes back. CPU halts at 550%.
> >
> > 2.3 io_thread_count to 8
> > 2.3.1 for bucket_size to 16, it fails somehow. After transferring
> > done, the memory accu
well, to 800%.
> 1. Sometimes, the writing queue can overcome, CPU will goes down after
> the memory accumulated. The writing speed recoved and memory back to
> normal.
> 2. Sometimes, it can't. IOBPS goes down sharply, and CPU never goes
> down after that.
>
> How many io th
You'll need to measure more but generally the bottleneck for writes is
usually going to be the disk itself. Unfortunately, standard OS buffered
I/O has some pretty negative behaviors in this case. First I'll describe
what I generally see happen (the last time I profiled this was a while back
but
ery helpful explanation.
>
> On Tue, Jul 25, 2023 at 6:41 PM Weston Pace wrote:
>
> > 1) As a rule of thumb I would probably prefer `async_scheduler`. It's
> more
> > feature rich and simpler to use and is meant to handle "long running"
> tasks
> > (e.g.
above it is probably ok to assume an implicit
ordering in many cases).
On Wed, Jul 26, 2023 at 8:18 AM Weston Pace wrote:
> > I think the key problem is that the input stream is unordered. The
> > input stream is a ArrowArrayStream imported from python side, and
like to have a discuss on dataset scanner, is it produce a
> > stable sequence of record batches (as an implicit ordering) when the
> > underlying storage is not changed? For my situation, the downstream
> > executor may crush, then it would request to continue from a
> > intermediate
1) As a rule of thumb I would probably prefer `async_scheduler`. It's more
feature rich and simpler to use and is meant to handle "long running" tasks
(e.g. 10s-100s of ms or more).
The scheduler is a bit more complex and is intended for very fine-grained
scheduling. It's currently only used in
> Reading the source code of exec_plan.cc, DeclarationToReader called
> DeclarationToRecordBatchGenerator, which ignores the sequence_output
> parameter in SinkNodeOptions, also, it calls validate which should
> fail if the SinkNodeOptions honors the sequence_output. Then it seems
> that
> Also, I don't understand why there are two versions of the hash table
> ("hashing32" and "hashing64" apparently). What's the rationale? How is
> the user meant to choose between them? Say a Substrait plan is being
> executed: which hashing variant is chosen and why?
It's not user-configurable.
Yes, those are the two main approaches to hashing in the code base that I
am aware of as well. I haven't seen any real concrete comparison and
benchmarks between the two. If collisions between NA and 0 are a problem
it would probably be ok to tweak the hash value of NA to something unique.
I
> I may be missing something, but why copy to *out_values++ instead of
> *out_values and add 32 to out_values afterwards? Otherwise I agree this is
> the way to go.
I agree with Jin. You should probably be incrementing `out` by 32 each
time `VisitValue` is called.
On Mon, Jul 17, 2023 at 6:38
at this sort of interoperability is what makes Arrow so
> compelling and something we should work very hard to preserve. This is
> the crux of my concern with standardising alternative layouts. I
> definitely hope that with time Arrow will penetrate deeper into these
> engines, perhaps in a si
Yes, that is correct.
What Substrait calls "groupings" is what is often referred to in SQL as
"grouping sets". These allow you to compute the same aggregates but group
by different criteria. Two very common ways of creating grouping sets are
"group by cube" and "group by rollup". Snowflake's
s on to my major concern with this proposal, that it adds
> >> complexity and cognitive load to the specification and implementations,
> >> whilst not meaningfully improving the performance of the operators that
> I
> >> commonly encounter as performance bottle
I agree the experiment isn't working very well. I've been meaning to
change my listing from `compute` to `acero` for a while. I'd be +1 for
just removing it though.
On Tue, Jul 4, 2023, 6:44 AM Dewey Dunnington
wrote:
> Just a note that for me, the main problem is that I get automatic
>
Congratulations Kevin!
On Mon, Jul 3, 2023 at 5:18 PM Sutou Kouhei wrote:
> On behalf of the Arrow PMC, I'm happy to announce that Kevin Gurney
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> --
> kou
>
> is this overflow considered a bug? Or is large exec batch something that
should be avoided?
This is not a bug and it is something that should be avoided.
Some of the hash-join internals expect small batches. I actually thought
the limit was 32Ki and not 64Ki because I think there may be some
Is your use case to operate on a batch of graphs? For example, do you have
hundreds or thousands of graphs that you need to run these algorithms on at
once?
Or is your use case to operate on a single large graph? If it's the
single-graph case then how many nodes do you have?
If it's one graph
We do this quite a bit in the Arrow<->Parquet bridge if IIUC. There are
macros defined like this:
```
#define BEGIN_PARQUET_CATCH_EXCEPTIONS try {
#define END_PARQUET_CATCH_EXCEPTIONS \
}\
catch (const
>> 2. For StringView and ArrayView, if the parent has `validity = false`.
>> If they have `validity = true`, there offset might point to a
invalid
>> position
>I have no idea, but I hope not. Ben Kietzman might want to answer more
>precisely here.
I think, for view arrays, the offsets
I agree with Antoine but I get easily confused by "valid, as in
structurally correct" and "valid, as in not null" so I want to make sure I
understand:
> The child of a nested
> array should be valid itself, independently of the parent's validity
bitmap.
A child must be "structurally correct"
Thanks for reaching out. This sounds like a useful tool and I'm happy to
hear about more development around establishing supply chain awareness.
However, Arrow is an Apache Software Project and, as such, we don't manage
all of the details of our Github repository. Some of these (including, I
> The trouble is that Dataset was not designed to serve as a
> general-purpose unmaterialized dataframe. For example, the PyArrow
> Dataset constructor [5] exposes options for specifying a list of
> source files and a partitioning scheme, which are irrelevant for many
> of the applications that
Congrats Dewey!
On Fri, Jun 23, 2023 at 9:00 AM Antoine Pitrou wrote:
>
> Welcome to the PMC Dewey!
>
>
> Le 23/06/2023 à 16:59, Joris Van den Bossche a écrit :
> > Congrats Dewey!
> >
> > On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens
> > wrote:
> >>
> >> Well deserved! Congratulations
One small difference seems to be that Close is idempotent and Cancel is not.
> void cancel()
> throws SQLException
>
> Cancels this Statement object if both the DBMS and driver support
aborting an SQL statement. This method can be used by one thread to cancel
a statement that is being
Those goals are somewhat compatible. Sasha can probably correct me if I
get this wrong but my understanding is that the minibatch is just large
enough to ensure reliable vectorized execution. It is used in some
innermost critical sections to both keep the working set small (fit in L1)
and
Before I say anything else I'll say that I am in favor of this new layout.
There is some existing literature on the idea (e.g. umbra) and your
benchmarks show some nice improvements.
Compared to some of the other layouts we've discussed recently (REE, list
veiw) I do think this layout is more
Congratulations Ben!
On Tue, Jun 20, 2023 at 7:38 AM Jacob Quinn wrote:
> Yay! Congrats Ben! Love to see more Julia folks here!
>
> -Jacob
>
> On Tue, Jun 20, 2023 at 4:15 AM Andrew Lamb wrote:
>
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Ben Baumgold, to
Note that you can ask pyarrow how much memory it thinks it is using with
the pyarrow.total_allocated_bytes[1] function. This can be very useful for
tracking memory leaks.
I see that memory-profiler now has support for different backends. Sadly,
it doesn't look like you can register a custom
and adds an extra buffer containing sizes. For symmetry
> >> with the List and LargeList types (FixedSizeList not included), I'm
> >> going to propose we add a LargeListView. That is not part of the
> >> draft implementation yet, but seems like an obvious thing to have
>
t; >> On Sat, May 27, 2023 at 7:44 PM Micah Kornfield
> >> wrote:
> >>>
> >>> This sounds reasonable to me but my main concern is, I'm not sure there
> >> is
> >>> a great mechanism to enforce canonical layouts don't somehow become
> >> de
Are you looking for something in C++ or python? We have a thing called the
"grouper" (arrow::compute::Grouper in arrow/compute/row/grouper.h) which
(if memory serves) is the heart of the functionality in C++. It would be
nice to add some python bindings for this functionality as this ask comes
Congratulations
On Tue, Jun 13, 2023, 1:28 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
> Congratulations!
>
> On Mon, 12 Jun 2023 at 22:00, Raúl Cumplido
> wrote:
> >
> > Congratulations Jie!!!
> >
> > El lun, 12 jun 2023, 20:35, Matt Topol
> escribió:
> >
> > > Congrats
> I would like to know if it is possible to skip the specific set of
batches,
> for example, the first 10 batches and read from the 11th Batch.
This sort of API does not exist today. You can skip files by making a
smaller dataset with fewer files (and I think, with parquet, there may even
be a
Congratulations!
On Thu, Jun 8, 2023, 5:36 PM Mehmet Ozan Kabak wrote:
> Thanks everybody. Looking to collaborate further!
>
> > On Jun 8, 2023, at 9:52 AM, Matt Topol wrote:
> >
> > Congrats! Welcome Ozan!
> >
> > On Thu, Jun 8, 2023 at 8:53 AM Raúl Cumplido
> wrote:
> >
> >> Congratulations
uch details should be discussed in a separate thread, but I
> raise this here just to point out that it implies an expansion in the
> scope of what Arrow interfaces can do.
>
> On Tue, Jun 6, 2023 at 6:17 PM Weston Pace wrote:
> >
> > From Micah:
> >
> > >
I think it might be worth rethinking binding
> a
> > layout into the schema versus having a different concept of encoding (and
> > changing some of the corresponding data structures).
> >
> >
> > On Mon, May 22, 2023 at 10:37 AM Weston Pace
> wrote:
> >
> &
same time the page index was read. That's how I'm implementing
> with Lance, and how I plan to implement with Delta Lake. But if you can't
> do that, then filtering with an anti-join makes sense. You wouldn't want to
> include those in a plan.
>
> On Fri, Jun 2, 2023 at 7:38 AM Weston
The simplest way to do this sort of paging today would be to create
multiple files and then you could read as few or as many files as you want.
This approach also works regardless of format.
With parquet/orc you can create multiple row groups / stripes within a
single file, and then partition
Also, for clarity, I do agree with Gang that these are both valuable
features in their own right. A mask makes a lot of sense for page indices.
On Fri, Jun 2, 2023 at 7:36 AM Weston Pace wrote:
> > then I think the incremental cost of adding the
> > positional deletes to the mask
mplementation. Table formats (e.g. Apache Iceberg and
> > Delta) require the knowledge of row index to finalize row deletion. It
> > would be trivial to natively support row index from the file reader.
> >
> > Best,
> > Gang
> >
> > On Fri, Jun 2, 2023 at
I agree that having a row_index is a good approach. I'm not sure a mask
would be the ideal solution for Iceberg (though it is a reasonable feature
in its own right) because I think position-based deletes, in Iceberg, are
still done using an anti-join and not a filter.
That being said, we
ow
> > > >>> settled
> > > >>>>>> upon, I'm not yet convinced it is sufficiently better to
> > incentivise
> > > >>>>>> broad ecosystem adoption.
> > > >>>>>>
> > > >>>&
Regrettabl, 12.0.0 had a significant performance regression (I'll take the
blame for not thinking through all the use cases), most easily exposed when
writing datasets from pandas / numpy data, which is being addressed in
[1]. I believe this to be a fairly common use case and it may warrant a
Congratulations!
On Mon, May 15, 2023 at 6:34 AM Rok Mihevc wrote:
> Congrats Gang!
>
> Rok
>
> On Mon, May 15, 2023 at 3:33 PM Sutou Kouhei wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Gang
> > Wu has accepted an invitation to become a committer on
> > Apache Arrow.
3Frand%3D1646387113%3Frand%3D1646387124%3Frand%3D1646387148=shibei.lh%40foxmail.com=KAmESwJvMrwAxwnQWafGjlsCzQ9tgHLSs7s2ohGx7ou54B0-ZyrWJkTg5npy2p1LmT5WQjSlhwncoGhA6w_xb-hQTDq6tGNfwF1sIGtP_HQ>
>
>
>
>
> 原始邮件
>
> 发件人:"Weston Pace"< weston.p...@gmail.com >;
>
>
I think there are perhaps various things being discussed here:
* Reusing large blocks of memory
I don't think the memory pools actually provide this kind of reuse (e.g.
they aren't like "connection pools" or "thread pools"). I'm pretty sure,
when you allocate a new buffer on a pool, it always
You're right that the default is delete/free. However, the important bit
is that it needs to be the correct delete/free. The error you described
originates from the fact that the final application has two copies of the
CRT and thus two copies of delete/free. Since shared_ptr/unique_ptr picks
I'm not very familiar with Windows. However, I read through [1] and that
matches your description.
I suppose I thought that a shared_ptr / unique_ptr would not have this
problem. I believe these smart pointers store / template a deleter as part
of their implementation. This seems to be
Congratulations!
On Thu, May 11, 2023 at 4:28 AM vin jake wrote:
> Congratulations Marco!
>
> On Thu, May 11, 2023 at 7:18 AM Andrew Lamb wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Marco Neumann
> > has accepted an invitation to become a committer on Apache
> > Arrow.
We allow arrays to have a shorter length than their buffers. Is it also
legal for a struct array to have a shorter length than its child arrays?
For example, in C++, I can create this today by slicing a struct array:
```
std::shared_ptr my_array =
std::dynamic_pointer_cast(array);
Congratulations!
On Wed, May 3, 2023 at 10:47 AM Raúl Cumplido
wrote:
> Congratulations Matt!
>
> El mié, 3 may 2023, 19:44, vin jake escribió:
>
> > Congratulations, Matt!
> >
> > Felipe Oliveira Carvalho 于 2023年5月4日周四 01:42写道:
> >
> > > Congratulations, Matt!
> > >
> > > On Wed, 3 May 2023
No, struct array is not naturally castable to map. It's not something that
can be done zero-copy and I don't think anyone has encountered this need
before. Let me make sure I understand.
The goal is to go from a type of STRUCT,
where every key in the struct has the same type, to a MAP, where
Thank you both for the extra information. Acero couldn't actually merge
the streams today, I was thinking more of datafusion and velox which would
often want to keep the streams separate, especially if there was some kind
of filtering or transformation that could be applied before applying a
So this would be a case where multiple "endpoints" are acting as a single
"stream of batches"? Or am I misunderstanding?
What're some scenarios where that would be done? When would it be
preferred for the client to merge the endpoints instead of the client's
user?
On Thu, Apr 27, 2023, 3:22 PM
ssing much more efficient. Is this understanding correct?
> >
> > [1]
> >
> https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout
> > [2]
> >
> https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding
For context, there was some discussion on this back in [1]. At that time
this was called "sequence view" but I do not like that name. However,
array-view array is a little confusing. Given this is similar to list can
we go with list-view array?
> Thanks for the introduction. I'd be interested
d) or
> >>> if additional support is going to be "as-needed". Note that I have a
> >>> minimal understanding of how "large" substrait is and what proportion
> of
> >> it
> >>> is already supported by
> >>> Acero.
&g
Sorry, I meant:
I am *now* a solid +1
On Mon, Apr 10, 2023 at 1:26 PM Weston Pace wrote:
> I am not a solid +1 and I can see the usefulness. Matt and I spoke on
> this externally and I think Matt has written a great summary. There were a
> few more points that came up in the d
pposed to just referring to the dlpack enum and treating this as
> an opaque integer if that would be preferable. I definitely agree with the
> difficulties in vendoring/repeating the dlpack enum values here and
> ensuring it stays up to date. Does anyone else have strong feelings one way
>
1 - 100 of 404 matches
Mail list logo