For anyone running into this same issue, it looks like Avro deserialization
is just broken when used with SparkSQL and partitioned schemas. I created
an bug report with details and a simplified example on how to reproduce:
https://issues.apache.org/jira/browse/SPARK-13709
--
Chris Miller
On Fri,
One more thing -- just to set aside any question about my specific schema
or data, I used the sample schema and data record from Oracle's
documentation on Avro support. It's a pretty simple schema:
https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/jsonbinding-overview.html
When I creat
No, the name of the field is *enum1* -- the name of the field's type is
*enum1_values*. It should not be looking for enum1_values -- that's not the
way the specification states that the standard works, and it's not how any
other implementation reads Avro data.
For what it's worth, if I change enum
your field name is
*enum1_values*
but you have data
{ "foo1": "test123", *"enum1"*: "BLUE" }
i.e. since you defined enum and not union(null, enum)
it tries to find value for enum1_values and doesn't find one...
On 3 March 2016 at 11:30, Chris Miller wrote:
> I've been digging into this a littl
I've been digging into this a little deeper. Here's what I've found:
test1.avsc:
{
"namespace": "com.cmiller",
"name": "test1",
"type": "record",
"fields": [
{ "name":"foo1", "type":"string" }
]
}
test2.avsc:
{
"namesp
Hi,
I have a strange issue occurring when I use manual partitions.
If I create a table as follows, I am able to query the data with no problem:
CREATE TABLE test1
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.Avr