Thanks Aldrin and Weston! Following your suggestions I was able to encode
the schema such that Athena recognized it... In case it helps anyone,
here's some sample code..
import { Schema, Field, Utf8, Table, RecordBatchStreamWriter, Int32, Bool,
DateMillisecond, DateDay } from 'apache-arrow';
const s = new Schema([
new Field('name', new Utf8),
new Field('address', new Utf8),
new Field('active', new Bool),
new Field('count', new Int32),
new Field('birthday', new DateDay),
new Field('created', new DateMillisecond)
]);
const w = new RecordBatchStreamWriter();
w.write(new Table(s));
const encodedSchema = Buffer.from(w.toUint8Array(true)).toString('base64');
On Fri, May 6, 2022 at 3:53 PM Aldrin <[email protected]> wrote:
> I didn't think of this as a possible solution, for some reason, but I
> think it actually makes a lot of sense. Just as a reference, this is
> something I currently do when storing data in a key-value interface:
>
> - I write a buffer with no batches
> - Write batches in separate buffers
> - these are sized to fully utilize the space for each key-value
>
> It is possible to then read the key-value that only contains a schema.
>
> I believe my approach for doing this can be seen in [1], and I use the
> StreamWriter because I want it to use an in-memory format that is
> streamable.
>
> [1]:
> https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/mainline/src/cpp/processing/dataformats.cpp#L16
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Fri, May 6, 2022 at 12:04 PM Weston Pace <[email protected]> wrote:
>
>> Can you serialize the schema by creating an IPC file with zero record
>> batches? I apologize, but I do not know the JS API as well. Maybe
>> you can create a table from just a schema (or a schema and a set of
>> empty arrays) and then turn that into an IPC file? This shouldn't add
>> too much overhead.
>>
>> On Thu, May 5, 2022 at 8:23 AM Howard Engelhart
>> <[email protected]> wrote:
>> >
>> > I'm looking to implement an Athena federated query custom connector
>> using the arrow js lib. I'm getting stuck on figuring out how to encode a
>> Schema properly for the Athena GetTableResponse. I have found an example
>> using python that does something like this.. (paraphrasing...)
>> >
>> > import pyarrow as pa
>> > .....
>> > return {
>> > "@type": "GetTableResponse",
>> > "catalogName": self.catalogName,
>> > "tableName": {'schemaName': self.databaseName, 'tableName':
>> self.tableName},
>> > "schema": {"schema":
>> base64.b64encode(pa.schema(....args...).serialize().slice(4)).decode("utf-8")},
>> > "partitionColumns": self.partitions,
>> > "requestType": self.request_type
>> > }
>> > What i'm looking for is the js equivalent of
>> > pa.schema(....args...).serialize()
>> >
>> > Is there one? If not, could someone point me in the right direction of
>> how to code up something similar?
>>
>