I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by both producers and
consumers (i.e. standardized).
The statistics array(s) could be a
map<
// the column index or null if the statistics refer to whole table or batch
column:
+1 (non-binding)
On Wed, 29 May 2024 at 11:30 Micah Kornfield wrote:
> +1 (non-binding for Parquet, Binding for Arrow if that makes a difference)
>
>
>
> On Wed, May 29, 2024 at 7:15 AM Rok Mihevc wrote:
>
> > # sending this to both dev@arrow and dev@parquet
> >
> > Hi all,
> >
> > Following
I want to +1 on what Dewey is saying here and some comments.
Sutou Kouhei wrote:
> ADBC may be a bit larger to use only for transmitting statistics. ADBC has
> statistics related APIs but it has more other APIs.
It's impossible to keep the responsibility of communication protocols
cleanly
Great news. Congratulations Dane!
On Tue, May 7, 2024 at 7:57 PM Vibhatha Abeykoon wrote:
>
> Congratulations Dane!!!
>
> Vibhatha Abeykoon
>
>
> On Wed, May 8, 2024 at 4:02 AM Jacob Wujciak wrote:
>
> > Congrats!
> >
> > Am Di., 7. Mai 2024 um 23:19 Uhr schrieb Bryce Mecum > >:
> >
> > >
Isn't that easily decodable from the UUID data itself?
If you allow the version to be specified as metadata, you now have to
validate and make sure it's consistent with the version encoded in the
contents of the UUID column. And UUID versions are more of a concern
for UUID generation than
The OP used UUID as an example. Would that be enough or the request is for
a flexible mechanism that allows the creation of one-off nominal types for
very specific use-cases?
—
Felipe
On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou wrote:
>
> Yes, JSON and UUID are obvious candidates for new
Algebraic Data Types (Sums and Products) are very abstract. This means
they don't fully specify a concrete/physical layout [1]: different
physical layouts can match the same algebraic definition. As an
in-memory data format specification, Arrow doesn't and shouldn't
rigidly specify concretization
Two comments:
——
Since this library is analogous to things like ADBC, ODBC, and JDBC, it’s
more of a “driver” than a “connector”. This might make your life easier
when explaining what it does.
It’s not a black and white thing, but “connector” might imply networking to
some people.
I believe
> I have found Twitter an extremely effective way for an open-source
project to communicate with the “exo-community” — people who are interested
in the project but not so invested that they join the email list. An open
source project needs to perform pretty much all of the functions of a
gt; wrote:
> > > >
> > > > > Congratulations, Felipe!
> > > > > ________
> > > > > From: Daniël Heres
> > > > > Sent: Thursday, December 7, 2023 2:59 PM
> > > > > To: dev@arrow.apache.org
> &
Congratulations! Well deserved.
On Mon, Nov 13, 2023 at 5:16 PM Neal Richardson
wrote:
> Congratulations!
>
> On Mon, Nov 13, 2023 at 3:10 PM Matt Topol wrote:
>
> > Congratulations Raul!!
> >
> > On Mon, Nov 13, 2023, 3:09 PM Antoine Pitrou wrote:
> >
> > >
> > > Welcome Raul, we're glad to
Congratulations Xuwei!
—
Felipe
On Mon, 23 Oct 2023 at 10:26 Vibhatha Abeykoon wrote:
> Congratulations Xuwei!
>
> On Mon, Oct 23, 2023 at 6:38 PM Weston Pace wrote:
>
> > Congratulations Xuwei!
> >
> > On Mon, Oct 23, 2023 at 3:38 AM wish maple
> wrote:
> >
> > > Thanks kou and every nice
+1
On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington
wrote:
> +1!
>
> On Wed, Oct 18, 2023 at 2:14 PM Matt Topol wrote:
> >
> > +1
> >
> > On Wed, Oct 18, 2023 at 1:05 PM Antoine Pitrou
> wrote:
> >
> > > +1
> > >
> > > Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit :
> > > > Hello all,
> >
It’s not the best since the format is really focused on in- memory
representation and direct computation, but you can do it:
https://arrow.apache.org/docs/python/feather.html
—
Felipe
On Tue, 17 Oct 2023 at 23:26 Nara wrote:
> Hi,
>
> Is it a good idea to use Apache Arrow as a file format?
The Zulip is
https://ursalabs.zulipchat.com/
On Tue, Oct 17, 2023 at 9:55 PM Will Jones wrote:
> Hi Curt,
>
> I think the most visible place for now would be creating an issue for
> discussion.
>
> In the future, if you and some others want to have a place to discuss C#
> development, you
gt; >
> > > But I also reiterate my plea that these existing parsers get fixed so
> as
> > > to entirely validate the format string instead of stopping early.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> >
Hello,
I'm writing to propose "+vl" and "+vL" as format strings for list-view and
large list-view arrays passing through the Arrow C data interface [1].
The previous proposal was considered a bad idea because existing parsers of
these format strings might be looking at only the first `l` (or
1 for +vl and
> +vL.
>
> On Thu, Oct 5, 2023 at 6:40 PM Felipe Oliveira Carvalho
> wrote:
> >
> > > Union format strings share enough properties that having them in the
> > > same switch case doesn't result in additional complexity...lists and
> > > list
haracter version (i.e.,
> >> maybe +v and +V)? A single-character version is (slightly) easier to
> >> parse in C.
> >>
> >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
> >> wrote:
> >>>
> >>> Hello,
> >&g
arse the format string are already rather
> unwieldy...it would be a nice quality-of-life improvement (although by
> no means a required one) to use a separate character.
>
> On Thu, Oct 5, 2023 at 3:34 PM Felipe Oliveira Carvalho
> wrote:
> >
> > This mailing
where this discussion may have occurred...is there a reason
> that +lv and +Lv were chosen over a single-character version (i.e.,
> maybe +v and +V)? A single-character version is (slightly) easier to
> parse in C.
>
> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
> wrote:
&g
Hello,
I'm writing to propose "+lv" and "+Lv" as format strings for list-view and
large list-view arrays passing through the Arrow C data interface [1].
The vote will be open for at least 72 hours.
[ ] +1 - I'm in favor of this new C Data Format string
[ ] +0
[ ] -1 - I'm against adding this
> > There'll probably be some minor comments to the format PR, but those
> > >> > don't deter from accepting these new layouts into the standard.
> > >> >
> > >> > Regards
> > >> >
> > >> > Antoine.
> > >> &g
sues as [1]?
>
> Kind Regards,
>
> Raphael Taylor-Davies
>
> [1]: https://lists.apache.org/thread/l8t1vj5x1wdf75mdw3wfjvnxrfy5xomy
>
> On 29/09/2023 13:09, Felipe Oliveira Carvalho wrote:
> > Hello,
> >
> > I'd like to propose adding ListView and LargeListVie
Hello,
I'd like to propose adding ListView and LargeListView arrays to the Arrow
format.
Previous discussion in [1][2], columnar format description and flatbuffers
changes in [3].
There are implementations available in both C++ [4] and Go [5]. I'm working
on the integration tests which I will
My take here is that Ben did an excellent job in hiding the fact that C++
has two variations of the format without leaking the pointer version via
the interfaces through which Arrow arrays are communicated to other
implementations.
As things stand right now, there is no zero-copy transfer of
> (a) stays pretty stable throughout the scan (stays < 1G), (b) keeps
increasing during the scan (looks linear to the number of files scanned).
I wouldn't take this to mean a memory leak but the memory allocator not
paging out virtual memory that has been allocated throughout the scan.
Could you
I marked the C++ implementation PR ready for review today and will soon be
working on the Go implementation.
https://github.com/apache/arrow/pull/35345
Note that differently from Velox's ArrayVector, the Arrow implementation
(ListView) also features a 64-bit version (LargeListView) to be
+1 (non-binding)
—
Felipe
On Fri, 18 Aug 2023 at 18:48 Jacob Wujciak-Jens
wrote:
> +1 (non-binding)
>
> On Fri, Aug 18, 2023 at 6:04 PM L. C. Hsieh wrote:
>
> > +1 (binding)
> >
> > On Fri, Aug 18, 2023 at 5:53 AM Neal Richardson
> > wrote:
> > >
> > > +1
> > >
> > > Thanks all for the
Hello,
I'm writing to inform you that I'm proposing "+r" as format string for
run-end encoded arrays passing through the Arrow C data interface [1].
Feel free to also discuss in the linked PR with the changes to bridge.cc
and reference docs.
[1]
ave
> multiple physical layouts. I agree. E.g. variable size list<32>,
variable
> size list<64>, and REE are the physical layouts that, combined with the
> logical type "string", give you "string", "large string", and
"ree"
>
> [1
A major difficulty in making the Arrow array types open for extension [1]
is that as soon as we define an (a) universal representation* or (b)
abstract interface, we close the door for vectorization. (a) prevents
having new vectorization friendly formats and (b) limits the implementation
of new
int8(), int16()… all return the same shared_ptr that gets
inc-ref’d on every "creation".
But any code taking type pointers shouldn't assume it comes from `static`
storage. All uses of a non-owning TypeHolder should be based on something
else ensuring the shared_ptr is alive while the TypeHolder
Values in the `offsets` Buffer of a ListArray can’t be left undefined
because the length of a valid entry before a NULL entry is the offset
associated with that NULL entry minus the previous offset.
The ListViewArray format I’m working on doesn’t have that restriction
because all the information
herently wrong with it, and if it ain't broke we
> really shouldn't be trying to fix it.
>
> Kind Regards,
>
> Raphael Taylor-Davies
>
> On 14 June 2023 17:52:52 BST, Felipe Oliveira Carvalho
> wrote:
>
> General approach to alternative formats aside,
ort
> ListView aspires to, such an addition could require non trivial changes to
> many / all of those implementations (and the APIs they expose).
>
> Andrew
>
> On Wed, Jun 14, 2023 at 12:53 PM Felipe Oliveira Carvalho <
> felipe...@gmail.com> wrote:
>
> > General a
t;
> On Wed, Jun 14, 2023 at 2:07 AM Antoine Pitrou wrote:
>
> >
> > I agree that ListView cannot be an extension type, given that it
> > features a new layout, and therefore cannot reasonably be backed by an
> > existing storage type (AFAICT).
> >
> > Also, I'm very lu
t; > worried that it might undermine the perception that the Arrow
> > > format
> > > > is
> > > > > > > stable. I think it might be worth thinking about "soft
> deprecating"
> > > > the
> > > > > > old
> > > >
+1 for me.
The C structs are clean and leave good room for extension.
--
Felipe
On Thu, May 25, 2023 at 12:04 PM David Li wrote:
> +1 for me.
>
> (Heads up: on the PR, there was some discussion since the last email and
> the meaning of 'experimental' was clarified.)
>
> On Tue, May 23, 2023,
Have you considered using fixed-length binary values for these?
Crypto algorithms might logically be defined in terms of mathematical
operations on integers, but their efficient implementation tends to feature
inlined operations at the machine word level instead of generic add, div,
mod, mul
ple,
> operations
> >> that slice these containers can be implemented in a zero-copy manner by
> >> just rearranging the lengths/offsets indices, without ever touching the
> >> larger internal buffers. This is a similar motivation as for StringView
> >> (think
luding compute kernels? Or are they likely to
> just
> > convert this type to ListArray at import boundaries?
> >
> > Because if it turns out to be the latter, then we might as well ask Velox
> > to export this type as ListArray and save the rest of the ecosystem some
> >
> I am actually trying to switch to arrow_static.lib.
Perhaps the issue is arrow_static.lib being linked with a static crt that's
not the one you are using in your project?
On Fri, May 12, 2023 at 3:13 PM Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) <
avertl...@bloomberg.net> wrote:
> This is not
; > >>
> > >> On Tue, Apr 25, 2023 at 3:13 PM Will Jones
> > wrote:
> > >>
> > >>> Hi Felipe,
> > >>>
> > >>> Thanks for the introduction. I'd be interested to hear about the
> > >>> applications Velox h
Congratulations, Matt!
On Wed, 3 May 2023 at 14:37 Andrew Lamb wrote:
> The Project Management Committee (PMC) for Apache Arrow has invited
> Matt Topol (zeroshade) to become a PMC member and we are pleased to
> announce
> that Matt has accepted.
>
> Congratulations and welcome!
>
After Weston's suggestion above, I've renamed files and classes in my WIP
implementation:
ArrayView -> ListView
On Wed, Apr 26, 2023 at 11:08 AM Ian Cook wrote:
> +1 to what Weston and Joris suggested regarding the name. "ListView"
> seems like the best name to use for this layout in Arrow.
>
Hi folks,
I would like to start a public discussion on the inclusion of a new array
format to Arrow — array-view array. The name is also up for debate.
This format is inspired by Velox's ArrayVector format [1]. Logically, this
array represents an array of arrays. Each element is an array-view
+1 for "pull request title *and* description".
Being able to read descriptions without leaving the editor is handy.
Keeping that information tracked in the repo means we don’t depend on
GitHub to reconstruct the history of the project.
On Tue, 31 Jan 2023 at 06:43 Antoine Pitrou wrote:
>
> +1
48 matches
Mail list logo