Very sorry, I deleted an important part of the code. it should read:
// Number of bytes each element takes up Int is 4, etc..
int itemSize = new AsDtype(dtype).memory_itemsize();
int countBufferSize = (entryStop - entryStart + 1) * INT_SIZE;
ArrowBuf countsBuf = allocator.buffer(countBufferSize);
for (int x = 0; x < (entryStop - entryStart + 1); x++) {
countsBuf.setInt(x * INT_SIZE, x * desc.getFixedLength());
}
// File format uses BE, so perform a byte swap to get to LE
ArrowBuf contentBuf = swapEndianness(contentTemp);
ArrowType outerType = new ArrowType.List();
// Convert from our internal dtype to the Arrow equivalent
ArrowType innerType = dtypeToArrow();
FieldType outerField = new FieldType(false, outerType, null);
FieldType innerField = new FieldType(false, innerType, null);
int outerLen = (entryStop - entryStart) * contentTemp.multiplicity();
int innerLen = contentTemp.numitems();
ArrowFieldNode outerNode = new ArrowFieldNode(outerLen, 0);
ArrowFieldNode innerNode = new ArrowFieldNode(innerLen, 0);
ListVector arrowVec = ListVector.empty("testcol", allocator);
arrowVec.loadFieldBuffers(outerNode, Arrays.asList(null, countsBuf));
AddOrGetResult<ValueVector> children =
arrowVec.addOrGetVector(innerField);
FieldVector innerVec = (FieldVector) children.getVector();
innerVec.loadFieldBuffers(innerNode, Arrays.asList(null, contentBuf));
On Thu, Apr 27, 2023 at 2:20 PM Andrew Melo <[email protected]> wrote:
>
> Hi all,
>
> I am working on a Spark datasource plugin that reads a (custom) file
> format and outputs arrow-backed columns. I'm having difficulty
> figuring out how to construct a ListVector if I have an ArrowBuf with
> the contents and know the width of each list. I've tried constructing
> the buffer with the code I pasted below, but it appears something
> becomes unaligned, and I get incorrect values back when reading the
> vector back.
>
> The documentation and elsewhere on the internet has examples where you
> construct the ListVector element-by-element (e.g. with
> UnionListWriter), but I'm having difficulty finding an example where
> you start from ArrowBufs and use that to directly construct the
> ListVector.
>
> Does anyone have an example they could point me to?
>
> Thanks!
> Andrew
>
> // Number of bytes each element takes up Int is 4, etc..
> int itemSize = new AsDtype(dtype).memory_itemsize();
> int countBufferSize = (entryStop - entryStart + 1) * INT_SIZE;
>
> ArrowBuf countsBuf = allocator.buffer(countBufferSize);
> // File format uses BE, so perform a byte swap to get to LE
> ArrowBuf contentBuf = swapEndianness(contentTemp);
>
> ArrowType outerType = new ArrowType.List();
> // Convert from our internal dtype to the Arrow equivalent
> ArrowType innerType = dtypeToArrow();
>
> FieldType outerField = new FieldType(false, outerType, null);
> FieldType innerField = new FieldType(false, innerType, null);
>
> int outerLen = (entryStop - entryStart) * contentTemp.multiplicity();
> int innerLen = contentTemp.numitems();
> ArrowFieldNode outerNode = new ArrowFieldNode(outerLen, 0);
> ArrowFieldNode innerNode = new ArrowFieldNode(innerLen, 0);
>
> ListVector arrowVec = ListVector.empty("testcol", allocator);
> arrowVec.loadFieldBuffers(outerNode, Arrays.asList(null, countsBuf));
>
> AddOrGetResult<ValueVector> children =
> arrowVec.addOrGetVector(innerField);
>
> FieldVector innerVec = (FieldVector) children.getVector();
> innerVec.loadFieldBuffers(innerNode, Arrays.asList(null, contentBuf));