I’m pretty sure AVRO only supports a single schema per file. You can create 
columns of record type and put each type of record in the correct column but at 
that point I might just look at using a MAP data type and write a custom record 
reader. Normally you’d split the data into a separate file for each schema but 
I can understand situations where that’s not ideal. I’ve got several flows that 
put XML keys into a MAP column and then split them out in Hive later.

Thanks
Shawn

From: Eric Chaves <[email protected]>
Sent: Sunday, March 17, 2019 11:14 AM
To: [email protected]
Subject: Is it possible to use declare an Avro schema for multi-record files?

Hi folks,

Is possible to declare an Avro schema for a ConvertRecord processor to handle 
multi-record file ie a file where each line may be a different avro record?

Something  like this:

{
  "type" : "record",
  "namespace" : "com.acme",
  "name" : "OrderFile",
  "fields" : [
      {
        "type" : "record",
        "namespace" : "com.acme",
        "name" : "HeaderRecord",
        "fields" : [
          {"name":"PNSTORE",    "type": "string"},
          {"name":"STORENAME",  "type": "string"},
          {"name":"EXTRACTIONDATE",   "type": "string"}
        ]
      },

      {
        "type" : "record",
        "namespace" : "com.acme",
        "name" : "OrderRecord",
        "fields" : [
            { "name": "SALESMAN", "type": "string" },
            { "name": "ORDER_NUMBER", "type": "string" },
            { "name": "DUE_DATE", "type": "string" },
            { "name": "ORDER_AMOUNT", "type": "long" }
        ]
      },

      {
        "type" : "record",
        "namespace" : "com.acme",
        "name" : "TrailerRecord",
        "fields" : [
          {"name":"TOTAL_RECORDS", "type": "long"},
          {"name":"TOTAL_AMOUNT", "type": "long"}
        ]
      }
  ]
}

Thanks in advance,

Eric


Reply via email to