Hi Jorn,
I am using Apache Spark 2.3.1.
For creating the parquet file I have used Apache Parquet (parquet-mr) 1.10.
This does not match the version of parquet used in Apache Spark 2.3.1 and if
you think that this could be the problem I could try to use Apache Parquet
version 1.8.3.
I created a parquet file using Apache Spark SQL types, but can not make the
resulting schema to match the schema described in the paper.
What I do is to use Spark SQL array type for repeated values. For example,
where papers says
repeated int64 Backward;
I use array type:
StructField("Backward", ArrayType(IntegerType(), containsNull=False),
nullable=False)
The resulting schema, reported by parquet-tools is:
optional group backward (LIST) {
repeated group list {
required int32 element;
}
}
Thanks,
Lubomir Chorbadjiev
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]