Does something like this work?
In [12]: import pyarrow.parquet as pq
In [13]: t =
pq.read_table('../cpp/submodules/parquet-testing/data/alltypes_dictionary.parquet')
In [14]: t.schema
Out[14]:
id: int32
bool_col: bool
tinyint_col: int32
smallint_col: int32
int_col: int32
bigint_col: int64
float_col: float
double_col: double
date_string_col: binary
string_col: binary
timestamp_col: timestamp[ns]
In [15]: [{'name': t.schema[i].name, 'type': str(t.schema[i].type)}
for i in range(len(t.schema))]
Out[15]:
[{'name': 'id', 'type': 'int32'},
{'name': 'bool_col', 'type': 'bool'},
{'name': 'tinyint_col', 'type': 'int32'},
{'name': 'smallint_col', 'type': 'int32'},
{'name': 'int_col', 'type': 'int32'},
{'name': 'bigint_col', 'type': 'int64'},
{'name': 'float_col', 'type': 'float'},
{'name': 'double_col', 'type': 'double'},
{'name': 'date_string_col', 'type': 'binary'},
{'name': 'string_col', 'type': 'binary'},
{'name': 'timestamp_col', 'type': 'timestamp[ns]'}]
On Wed, Dec 19, 2018 at 2:16 AM Femi Anthony
<[email protected]> wrote:
>
> Hi, I'm using pyarrow to read parquet data from s3 and I'd like to be able to
> parse the schema and convert it to a format suitable for running an mLeap
> serialized model outside of Spark.
>
> This requires parsing the schema.
>
> If I had a Pyspark dataframe, I could do this:
>
> test_df = spark.read.parquet(test_data_path)
> schema = [ { "name" : field.simpleString().split(":")[0], "type" :
> field.simpleString().split(":")[1] }
> for field in test_df.schema ]
>
> How can I achieve the same if I read the data using pyarrow instead ?
> Also, for the Spark dataframe I can obtain the rows in a suitable format for
> model evaluation by doing the following:
>
> rows = [[field for field in row] for row in test_df.collect()]
>
> How can I achieve a similar thing using pyarrow ?
>
> Thanks in advance for your help.
>
> Femi Anthony
> --
> Card Machine Learning (ML) Team, Capital One
>
> ________________________________
>
> The information contained in this e-mail is confidential and/or proprietary
> to Capital One and/or its affiliates and may only be used solely in
> performance of work or services for Capital One. The information transmitted
> herewith is intended only for use by the individual or entity to which it is
> addressed. If the reader of this message is not the intended recipient, you
> are hereby notified that any review, retransmission, dissemination,
> distribution, copying or other use of, or taking of any action in reliance
> upon this information is strictly prohibited. If you have received this
> communication in error, please contact the sender and delete the material
> from your computer.