It's not clear if this will actually hit your use case or not.
Specifically, low overhead means different things to different people.
Also, my suggestion is not a database -- it is an analytics engine. The
persistence part of the problem is off the table at this time, too. I
wanted to mention it up
> I'm not sure anyone is actively working on RLE or other encoding schemes
> at the moment.
>
> -David
>
> On Mon, Nov 8, 2021, at 13:19, Nate Bauernfeind wrote:
> > I've written up the ColumnBag proposal addressing items 1 and 2 on the
> > list. I'm open
ng existing users of RecordBatch to rather different behavior.
> > >
> > > For #3, a different thread was discussing some of the points there - it
> > sounds like it may be possible to relax from map to
> > map.
> > >
> > > -David
> > >
>
Meeting notes for arrow-sync on 09/15/2021.
Attendees:
- Nate Bauernfeind
- Nic Crane
- Alenka Frim
- Rok Mihevc
- Niranda Perera
There was no discussion this week; all attendees were here to lurk, listen,
and be-a-fly-on-the-wall.
See you in two weeks.
--
On Wed, Sep 15, 2021 at 8:08 AM
HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more
specific, or possibly do some more research on your end
Which arrow flight client are you using in your test? Java? C++? Which
version? Can you provide a simple gRPC server/client example that shows up
in WireShark as you expec
need a
> > > > vote to update this since it is changing files in the format dir.
> > > >
> > > > I did check the Java implementation quickly and even in the initial
> > > > version, the schema is IPC-encapsulated[1].
> > > >
> > >
Wes suggested that maybe there are enough new ideas that it may make sense
to evolve-past the existing structures rather than to bolt-on new
functionality. I would like to learn what requirements exist should new
structures be adopted, and if applicable, would like to turn this into a
full POC prop
In flight.proto [1] it states that the encoded bytes are as described in
the flatbuffer schema.
```
/*
* Wrap the result of a getSchema call
*/
message SchemaResult {
// schema of the dataset as described in Schema.fbs::Schema.
bytes schema = 1;
}
```
However, both this schema and the schem
n it is empty" is a
> > feature, then we may not want to allocate space for those nodes given
> that
> > the record batch length will likely be greater than zero.
>
>
> Having conflicting RecordBatch and top-level field nodes is something that
> I believe we have pushed ba
Is this still happening today?
On Tue, Jul 6, 2021 at 11:07 AM Ian Cook wrote:
> Hi all,
>
> Our biweekly sync call is tomorrow at
> https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes
> will be shared with the mailing list afterward.
>
> Ian
>
--
s a separate file alongside the Arrow file (indexed by
> record
> > > > > batch index) where you can take advantage of whatever format is
> most
> > > > > suitable.
> > > > >
> > > > > -David
> > > > >
> > >
>
> > > makes it more difficult to bring schema evolution back into the
> > > IPC Stream format (i.e. it would live only in flight)
> >
> > Gosh's proposal extends the flatbuffer structures not the protobufs. Can
> > you help me understand how difficult it would be to bring the `schema_id`
> > appr
lexing" point
>>>> > while at the same time it gives enough flexibility to address both
>>>> Nate's
>>>> > and our use cases.
>>>> > 2. To David's point about other transports: in fact currently we are
>>>> using
> Basically, it reset/set the borrow bit in eflag register based on the if
condition, and runs `outpos = outpos - (-1) - borrow_bit`.
That's clever, and I clearly didn't see that!
On Thu, Jun 24, 2021 at 8:57 PM Yibo Cai wrote:
>
>
> On 6/25/21 6:58 AM, Nate Bauernfei
FYI, the bench was slightly broken; but the results stand.
> benchmark::DoNotOptimize(output[rand()]);
Since rand() has a domain of 0 to MAX_INT it blows past the output array
(of length 4k). It segfaults in GCC; I'm not sure why the Clang benchmark
is happy with that.
I modified [1] it to:
> ben
Option C.
On Thu, Jun 24, 2021 at 1:53 PM Joris Peeters
wrote:
> C
>
> On Thu, Jun 24, 2021 at 8:39 PM Antoine Pitrou wrote:
>
> >
> > Option C.
> >
> >
> > Le 24/06/2021 à 21:24, Weston Pace a écrit :
> > >
> > > This proposal states that Arrow should define how to encode an Instant
> > > into
Thanks for writing this up! I added a few general comments, but have a
question on the approach because it's not quite what I was expecting.
I am slightly concerned that the proposal looks more like support for
"multiplexing" IPC streams into a single RPC stream rather than support for
a changing
Congratulations! Well earned!
On Mon, Jun 21, 2021 at 4:20 PM Ian Cook wrote:
> Congratulations, David!
>
> Ian
>
>
> On Mon, Jun 21, 2021 at 6:19 PM Wes McKinney wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > David M Li to become a PMC member and we are
finding someone to do the work.
>
> Best,
> David
>
> On Thu, Jun 3, 2021, at 12:11, Nate Bauernfeind wrote:
> > In addition to Arrow Flight we have other gRPC APIs that work together
> as a
> > whole. For example, the API client establishes a session with the server
a
> useful addition. What sorts of things would it enable for you?
>
> -David
>
> On Wed, Jun 2, 2021, at 16:20, Nate Bauernfeind wrote:
> > It seems to me that the c++ arrow flight implementation uses only the
> > synchronous version of the gRPC API. gRPC supports asynchr
It seems to me that the c++ arrow flight implementation uses only the
synchronous version of the gRPC API. gRPC supports asynchronous message
delivery in C++ via a CompletionQueue that must be polled. Has there been
any desire to standardize on a solution for asynchronous use cases, perhaps
deliver
the
> > feature
> > > as
> > > > > a new field in the protobuf so that it can be used in contexts with
> > > other
> > > > > header metadata types? Do you have time to riff on the format that
> > will
> > > > > apply to the other c
Suggestion: faster -> more efficiently
"Apache Arrow is a cross-language development platform for in-memory
data. It enables systems to process and transport data more efficiently."
On Sun, May 16, 2021 at 11:35 AM Wes McKinney wrote:
> Here's what there now:
>
> "Apache Arrow is a cross-langua
ion to an apache project; please let me know if there is anything
else that I need to do to get this past the finish line.
https://github.com/apache/arrow/pull/10058
Thanks,
Nate
On Wed, Apr 14, 2021 at 11:45 PM Nate Bauernfeind <
natebauernfe...@deephaven.io> wrote:
> Hey Bob,
>
> So
ng.
> The methods of generation are all over the map, and some have no script or
> build file, just doc. Would there be any value in making this more uniform?
>
> On 2021/04/14 16:36:47, Nate Bauernfeind
> wrote:
> > It would also be nice to upgrade that java flatbuffer ver
It would also be nice to upgrade that java flatbuffer version from 1.9 to
1.12. Is anyone planning on doing this work (as listed in ARROW-12111)?
If I did this work today, might it be possible to get it included in the
4.0.0 release?
On Fri, Mar 26, 2021 at 3:25 PM bobtins wrote:
> OK, original
> possibly in coordination with the Deephaven/Barrage team, if they're also
still interested
Good opportunity for me to chime in =). I think we still have interest in
this feature. On the other thread, it took a little cajoling, but I've come
around to agree with the conclusions of taking a Record
a lot of wiggle
room for alternatives.
On Fri, Mar 19, 2021 at 10:03 AM Nate Bauernfeind <
nate.bauernfe...@gmail.com> wrote:
> The dictionary is not allowed to change throughout the file; which is
> ultimately OP's request. This is because all of the dictionary definition
> is i
mpression that the file format is supposed to support
> deltas, but not replacements. Is this not implemented in C++?
>
> On Thu, Mar 18, 2021 at 9:57 PM Nate Bauernfeind <
> nate.bauernfe...@gmail.com>
> wrote:
>
> > If dictionary replacements were supported, then the
If dictionary replacements were supported, then the IPC file format
couldn't guarantee random access reads.
Personally, I would like to support a stream-based file format that is a
series of the Flight protobufs. In my extension of arrow flight, by
stuffing our state-based data into the app_metada
> I also found out today that there is an official ASF slack with multiple
Arrow channels, but this is only open to people who already have an
apache.org email address (committers / PMC).
FYI, non committers / PMC members can join the slack using this link:
https://s.apache.org/slack-invite
On We
e first record
> batch, without having to modify anything about the record batch itself, and
> without having to define a new metadata header at the Arrow level -
> everything could be implemented on top of the existing definitions.
>
> David
>
> On Sat, Mar 6, 2021, at 01:07
Batch
flatbuffer for added rows.
- A set of FlightData record batches also using the normal RecordBatch
flatbuffer for modified rows.
On Fri, Mar 5, 2021 at 11:00 PM Nate Bauernfeind <
natebauernfe...@deephaven.io> wrote:
> > It seems that atomic application could also be something controlled i
g. switching dictionary encoding on/off).
>
> -Micah
>
>
> On Fri, Mar 5, 2021 at 11:42 AM David Li wrote:
>
> > (responses inline)
> >
> > On Thu, Mar 4, 2021, at 17:26, Nate Bauernfeind wrote:
> > > Regarding the BarrageRecordBatch:
> > >
>
t support - there's an existing ticket:
> https://issues.apache.org/jira/browse/ARROW-9860
>
> I was sure I had seen another organization talking about browser support
> recently, but now I can't find them. I'll update here if I do figure it out.
>
> Best,
> Davi
tly was
> planning to look at JavaScript support for Flight (using WebSockets as the
> transport, IIRC) and it might make sense to join forces if that's a path
> you were also going to pursue.
>
> Best,
> David
>
> On Wed, Mar 3, 2021, at 18:05, Nate Bauernfeind wrot
7;s existing metadata fields/API that would prevent you from using
> them, as that way you (and we!) don't have to fully duplicate one of
> Arrow's format definitions. Similarly, Flight already has a bidirectional
> streaming endpoint, DoExchange, that allows arbitrary payloads (with
forward to your feedback; thank you!
Nate Bauernfeind
Deephaven Data Labs - https://deephaven.io/
--
38 matches
Mail list logo