Re: [Java] How to read data from VectorSchemaRoot without casting each value?

Jacques Nadeau Fri, 10 Sep 2021 09:41:15 -0700

The most efficient way to work with vectors is accessing the memory
directly large chunks at a time (especially for validity). However, people
should be cautious in terms of premature optimization versus
maintainability.


Your examples are all single cell reads. In that case, I can't imagine any
option would be fine. If I was working over many records, I would do
something similar to the below. (It's probably slightly more performant
than pre-grabbing fieldreaders and then setting position for each round
inside the loop):

BigIntVector bigInt1 = (BigIntVector) vectorSchemaRoot.getVector(123);
BigIntVector bigInt2 = (BigIntVector) vectorSchemaRoot.getVector(124);

long sum1;
long sum2;

for(i = 0; i < recordcount; i++) {
  if (bigInt1.isSet(i)) {
    sum1 += bigInt1.get(i);
  }
  if (bigInt2.isSet(i)) {
    sum2 += bigInt2.get(i);
  }
}



On Thu, Sep 9, 2021 at 10:34 PM Daniel Hsu <[email protected]> wrote:

> Is using FieldReader recommended over using using a cast with direct
> access? In
> https://arrow.apache.org/docs/java/vector.html#building-valuevector it
> says that
>
> "writer/reader is not as efficient as direct access"
>
> What's the recommended way to read a value between these two techniques?
>
> First method using reader:
>
> FieldReader reader = vectorSchemaRoot.getVector(123).getReader();
> reader.setPosition(456);
> reader.readLong();
>
> Second method using cast to BigIntVector with direct access:
>
> ((BigIntVector) vectorSchemaRoot.getVector(123)).getValueAsLong(456);
>
> On 2021/09/09 16:33:57, Micah Kornfield <[email protected]> wrote:
> > I'll add that getObject is going to be expensive in general, since it
> boxes
> > the integer and does a copy of the VarBinary data.
> >
> > On Thu, Sep 9, 2021 at 9:25 AM Jacques Nadeau <[email protected]>
> wrote:
> >
> > > Fieldreader was defined to expose direct access in a type centric way.
> > >
> > > On Thu, Sep 9, 2021, 1:47 AM Daniel Hsu <[email protected]>
> wrote:
> > >
> > >> Perhaps a better way to phrase this question is:
> > >>
> > >> If the VectorSchemaRoot already stores BigIntVector's and
> > >> VarBinaryVector's, how can I make the VectorSchemaRoot directly return
> > >> `long` and `byte[]` values when doing random reads, instead of
> returning
> > >> `Object` when doing random reads?
> > >>
> > >> On 2021/09/09 08:43:56, Daniel Hsu <[email protected]> wrote:
> > >> > I have a VectorSchemaRoot object containing many BigIntVector's and
> > >> VarBinaryVector's, and I want to do many random value reads.
> > >> >
> > >> > Right now I am doing the random value reads like this:
> > >> >
> > >> > VectorSchemaRoot # getVector(<vector number>) # getObject(<row
> number>)
> > >> >
> > >> > This returns an `Object` and then I look in VectorSchemaRoot #
> Schema #
> > >> getField() to figure out whether to cast this object to a `long` or
> > >> `byte[]`.
> > >> >
> > >> > Is it possible to avoid casting from `Object` to `long` or `byte[]`
> on
> > >> every random read?
> > >> >
> > >>
> > >
> >
>

Re: [Java] How to read data from VectorSchemaRoot without casting each value?

Reply via email to