For what it's worth, duckdb accesses arrow data via IPC in an extension then
exports to C data interface to call into code in its core.
Also, assumptions about when query optimization occurs relative to data access
potentially breaks down in scenarios involving: views, distributed tables,
with more information and thoughts in the meantime.
[1]: https://arxiv.org/pdf/2304.05028.pdf
Sent from Proton Mail for iOS
On Sat, Mar 23, 2024 at 05:23, Andrei Lazăr lazarandrei...@gmail.com
wrote: Hi Aldrin, thanks for taking the time to reply to my email!
In my understanding, compression on Par
Hello!
I don't do much with compression, so I could be wrong, but I assume a
compression algorithm spans the whole column and areas of large variance
generally benefit less from the compression, but the encoding still provides
benefits across separate areas (e.g. separate row groups).
My
# --
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io/octalene
On Thursday, March 14th, 2024 at 09:10, Jayjeet Chakraborty
wrote:
> Hi Ben, I am willing to help out with the refactor too !
>
> On Wed, Mar 13, 2024 at 9:25 PM Aldrin
I am interested in helping to refactor!
-Aldrin
On Wed, Mar 13, 2024 at 08:54, Benjamin Kietzman bengil...@gmail.com
wrote: Skyhook [1] enables efficient predicate and projection pushdown from
Arrow Dataset to a Ceph storage cluster. This is very cool
functionality, but it's tightly coupled
Hello!
For an Array of mixed types, you can use a DenseUnion [1] or SparseUnion type
[2].
For modeling as rows instead of columns, the short answer is "no" but you could
store the pivot/rotation of the table (columns represent rows) or you can use
something like a StructArray [3]. The data in
Maybe it would be valuable to more explicitly define "moving back into
DataFusion project".
I assumed it meant absorbing into the datafusion repo, but it occurs to me that
may not be the case. Then, how would sqlparser-rs be "moved"?
# ---
feedback.
I glanced at the document before but I'll go through again to see if there is
anything I can comment on.
# --
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io/octalene
On Tuesday, February 27th, 2024 at 17:43, Paul
<<< text/html; charset=utf-8: Unrecognized >>>
publicKey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.read_csv
[4]:
https://arrow.apache.org/datafusion/library-user-guide/custom-table-providers.html
# --
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io
<<< text/html; charset=utf-8: Unrecognized >>>
publicKey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
implementations since ChunkedArray is not part of
the specification, though I am optimistic that if you pass ChunkedArray to a
different implementation then the C++ implementation could consolidate it as a
single Array.
# --
# Aldrin
https://github.com/drin
#_CPPv4N5arrow16TableBatchReaderE
[8]: https://arrow.apache.org/docs/cpp/compute.html#selections
# --
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io/octalene
On Wednesday, November 22nd, 2023 at 10:58, Jacek Pliszka
wrote:
> Hi!
>
> I t
try the unsubscribe link at [1].[1]: https://arrow.apache.org/community/ Sent from Proton Mail for iOS On Thu, Oct 19, 2023 at 23:41, Richard Haven wrote: UNSUBSCRIBEBAJARSEANFOSGRIFIADОТПИШИHLOKOMELAOn Thu, Oct 19, 2023 at 9:56 AM Antoine Pitrou wrote:>> Hello
And the first paper's reference of arrow (in the references section) lists 2022 as the date of last access. Sent from Proton Mail for iOS On Thu, Oct 19, 2023 at 18:51, Aldrin <octalene@pm.me.INVALID> wrote: For context, that second referenced paper has Wes McKinney as a co-auth
For context, that second referenced paper has Wes McKinney as a co-author, so they were much better positioned to say "the right things." Sent from Proton Mail for iOS On Thu, Oct 19, 2023 at 18:38, Jin Shang wrote: Honestly I don't understand why this VLDB paper [1]
convert any type to a raw pointer I assume that internal representations are not problematic. But, even so, perhaps those benchmarks can be reused to do the comparison (if that helps reduce the amount of work to be done for Ben).-Aldrin Sent from Proton Mail for iOS On Wed, Sep 27, 2023 at 15:12
Oh wait, I see now that you're incrementing with a uint8_t*. That could be fine for your own use, but you might want to make sure it aligns with the type of your output (Int64Array vs Int32Array). Sent from Proton Mail for iOS On Mon, Jul 17, 2023 at 06:20, Aldrin <octalene@pm.me.INVA
Hi Wenbo,An ArraySpan is like an ArrayData but does not own the data, so the ColumnarFormat doc that Jon shared is relevant for both.In the case of a binary format, the output ArraySpan must have at least 2 buffers: the offsets and the contiguous binary data (values). If the output of your UDF
without having to prove out the benefits for libraries that
>use a different tech stack (e.g. rust vs C++ vs go).
[1]:
https://docs.google.com/presentation/d/1EiBgwtoYW6ADTxFc9iRs8KLPV0st0GZqmGy40Uz8jPk/edit?usp=sharing
# ------
# Aldrin
https://github.com/dri
djacency
lists or if you're using a more normalized relational format.
Thanks!
# ------
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io/octalene
publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.
I don't feel like this representation is necessarily a detail of the query engine, but I am also not sure why this representation would have to be converted to a non-view format when serializing. Could you clarify that? My impression is that this representation could be used for persistence or
tself is working or if there's something in
your configuration that's wrong.
I can show more direct examples once I update my environment.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Apr 7, 2023 at 7:34 AM Haocheng Liu wrote:
> Hi,
>
> I'm new to arrow development and
a draft PR? In
general
I agree with the general direction of the discussion otherwise.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Mar 31, 2023 at 7:49 AM Will Jones wrote:
> > Also good to know: contributors apparently can't re-open PRs if it was
> > closed by
Congrats Will!!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Mon, Mar 13, 2023 at 11:13 AM Dewey Dunnington
wrote:
> Congrats, Will!
>
> On Mon, Mar 13, 2023 at 3:07 PM Matt Topol wrote:
> >
> > Congrats Will!
> >
> > On Mon, Mar 13, 2023, 2
as valuable (should be
prioritized) or
if additional support is going to be "as-needed". Note that I have a
minimal understanding of how "large" substrait is and what proportion of it
is already supported by
Acero.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Thu, Mar
]:
https://arrow.apache.org/docs/python/generated/pyarrow.Field.html#pyarrow.Field.with_metadata
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Feb 15, 2023 at 2:52 PM Li Jin wrote:
> Oh thanks that could be a workaround! I thought pa tables are supposed to
> be imm
out, your main concern should probably be protocol
compatibility. If you will have control of the client side of
communications,
then I think there are minimal concerns other than how you design what a
Ticket or FlightInfo contains.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri
especially
while Arrow is still growing. In
addition, if I want to contribute to Arrow, I would also need to interact
with the lower-level API at some
point and I wouldn't necessarily want to start with trying to contribute
code before using it in my own
project(s).
Aldrin Montana
Computer Scienc
awesome, congrats!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Tue, Sep 6, 2022 at 6:10 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
> Congrats Weston! It is great to have you on the team!
>
> On Tue, 6 Sept 2022 at 06:10, Weston Pace wrote:
e "IPC" is necessary, but it does push the intent into the name
(unless it's
actually a misnomer).
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Tue, Aug 30, 2022 at 8:29 PM Micah Kornfield
wrote:
> I think one source of ambiguity for Arrow files, at least for me, is
/presentation/d/1Nollf087CRhMmEAWcwfudIizIhF-ttPRGgaqmuXtSBQ/edit#slide=id.g12c2952ca0d_0_67
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Aug 31, 2022 at 10:29 AM Jayjeet Chakraborty <
jayjeetchakrabort...@gmail.com> wrote:
> Thanks a lot for your reply, Niranda a
I don't have any pointers, but just wanted to mention that I am going to
try and figure this out quite a bit in the next week. I can try to create
some relevant cookbook recipes as I plod along.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Aug 17, 2022 at 9:15 AM Li Jin
ooh, that seems like a good idea to me. I'd be happy to follow that style.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Aug 10, 2022 at 4:21 PM Sasha Krassovsky
wrote:
> Hi everyone,
> I've recently had quite a few pain points while debugging due to the use of
>
oh, perfect. I'll just link the JIRAs. Thanks Kou!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Mon, Jul 25, 2022 at 1:53 PM Sutou Kouhei wrote:
> Hi,
>
> https://issues.apache.org/jira/browse/ARROW-17092 may be
> related.
>
> Thanks,
> --
> kou
>
://arrow.apache.org/docs/format/Columnar.html#ipc-file-format
[3]: https://arrow.apache.org/docs/cpp/ipc.html
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Jul 22, 2022 at 2:46 PM Will Jones wrote:
> FYI It looks like there is active work to change the Python [1] and R
sorry, I meant "...especially *for* the rust community if they are just
using IPC directly for file formats."
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Jul 22, 2022 at 11:14 AM Aldrin wrote:
> I always assumed IPC was when it was in memory, fea
since V2.
I'm not sure if a feather V3 would ever diverge from IPC format or if
feather adds anything that's more filesystem friendly (versus other storage
system interfaces) or makes filesystem performance more predictable.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Jul 22
table.html#_CPPv4N5arrow17ConcatenateTablesERKNSt6vectorINSt10shared_ptrI5Table24ConcatenateTablesOptionsP10MemoryPool
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Jun 29, 2022 at 9:53 AM L Ait wrote:
> Hi,
>
> I would like to be added to the mailing list and would like it if there is
> some dedicated forum to ask some questions.
>
> I would lik
done.
[1]: https://arrow.apache.org/docs/cpp/compute.html#invoking-functions
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Jun 22, 2022 at 12:34 PM Murali S wrote:
> Hi ,
>
> I was wondering if it is possible to add a C++ Function to the Compute
> Functi
instructions? I think a little bit more context about what you know
and what you're trying to do could also help others who know more about
this function (and vectorization in Arrow in general) to chime in.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Thu, Jun 23, 2022 at 12:41
quot;C++" can be inserted ("A C++ compute...")
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Thu, May 19, 2022 at 6:07 PM Will Jones wrote:
> >
> > A relatively obscure name at least makes it easy to search for. I guess
> > we'll want to w
in that vein, I feel like you could also say that "ACE" has an "an" prefix
to deflect the connotation of primacy:
- An Arrow Compute Engine
- An Arrow C++ Compute Engine
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Mon, May 9, 2022 at 2:12 PM Ian Cook
[1]:
https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/parquet/properties.h#L556
[2]:
https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7Scanner7ToTableEv
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Mon, Apr 25, 2022 at 3:05 AM 1057445597 <1057445...@q
lob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L644
[3]:
https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L665
[4]:
https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L1253
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
Thanks for the response! I'll try that out. It didn't occur to me that
archlinux might be building the static libraries yet not installing them
(and/or removing them).
I'll check a few things and report back here what works.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Feb
ON=ON \
-DARROW_SIMD_LEVEL=AVX2\
-DARROW_USE_GLOG=ON\
-DARROW_WITH_BROTLI=ON \
-DPARQUET_REQUIRE_ENCRYPTION=ON
make -C build
Thank you for any help you can offer!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
I think you just sign up:
https://issues.apache.org/jira/secure/Dashboard.jspa
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Dec 22, 2021 at 9:08 PM Dulvin Witharane wrote:
> Hi,
>
> I would love to have access to JIRA. Please enroll me or let me know the
>
t of time parsing metadata and
> much less time actually reading data.
Thanks!
> --
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
>
> How about trying GitHub issues and/or discussion in a
> specified period without deprecating user@? e.g. between
> 6.0.0 release and 7.0.0 release.
Oooh, I like this idea.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Mon, Oct 4, 2021 at 7:11 PM Sutou Kouhei
degree, though, the ease of searching should mitigate
this
if people are properly cross-referencing as appropriate. But, I'm not
entirely sure
what this would be problematic for.
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Wed, Sep 29, 2021 at 11:16 AM Micah Kornfield
wrote
tion is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Thank you.
>
--
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
Congrats David! Thanks for the contributions to documentation, it's pretty
awesome. :)
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Tue, Jun 22, 2021 at 10:55 AM Daniël Heres wrote:
> Congrats to you!
>
> On Tue, Jun 22, 2021, 19:42 Eduardo Ponce wrote:
>
&g
art of the
interface for efficiency
- Arrow certainly has a data format, but that format is the crux of the
interface (IMO). However, it also makes using other formats easy (via
filesystem API and parquet reader/writers, etc.). So, focusing on the data
format seems unnecessary in such a terse d
I very much enjoy the new theme
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Tue, May 4, 2021 at 11:47 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
> Thanks, I am happy that people like it!
> It's a slightly customized version of the pydata-
This is great, thanks!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Fri, Mar 12, 2021 at 11:39 AM Andrew Lamb wrote:
> Here are links to the content, should anyone be interested:
>
> Query Engine Design and the Rust-Based DataFusion in Apache Arrow
> reco
Great, thanks for the responses! That all makes sense :)
On Thu, Mar 11, 2021 at 1:29 PM Benjamin Kietzman
wrote:
> Hi Aldrin,
>
> We don't have a unified repository for design docs that I'm aware of.
> Governance-wise only JIRA and the mailing lists are canonical, but
> IIU
to navigate to a google drive or a page
enumerating the various documents.
Thank you!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Thu, Mar 11, 2021 at 10:07 AM Benjamin Kietzman
wrote:
> Hi,
>
> This is not yet implemented but it is on the roadmap for the near future:
t; OR
description ~ "expression")
Specifically, I'm interested in C++ rather than python (though, I suppose
pyarrow documentation can help with the C++ documentation?).
I wanted to ping here in case anyone has materials to gather, and also in
case anyone knows of materials I've missed.
Thanks!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
in (or completed) consolidating or
expanding documentation on the compute and dataset/expression APIs and how
they interact, etc.?
Thanks!
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Mon, Nov 30, 2020 at 7:40 AM Wes McKinney wrote:
> One objective of the precompiled kernels proj
Aldrin created ARROW-2683:
-
Summary: Resource Warning (Unclosed File) when using
pyarrow.parquet.read_table()
Key: ARROW-2683
URL: https://issues.apache.org/jira/browse/ARROW-2683
Project: Apache Arrow
61 matches
Mail list logo