Hi folks,

I've been trying to solve a schema evolution issue by adding "default": {
"fieldName": null } to some existing fields in our schemas. This addition
can cause a schema to stop parsing properly, hitting a "Schema fields not
set yet" error instead. As far as I can tell, the issue is contained to
self-referential schemas.

I'm using the Java libraries, primarily release 1.11.4, but I reproduced on
1.12.0 for this email.

Here's an example schema that parses successfully:
https://pastebin.com/wgtZSjCq (also Exhibit A below)

Parsing will hit a "Schema fields not set yet" error if we add "default": {
"value": null } right before we reuse a name:
https://pastebin.com/C0hRmNfX (Exhibit
B below; note line 29 in pastebin)

Example stack trace on 1.12.0: https://pastebin.com/v95bAF6R (Exhibit C
below)

The avro specification says "...a name must be defined before it is used
(“before” in the depth-first, left-to-right traversal of the JSON parse
tree...)". I'm honestly not sure if the schema above follows that rule or
not.

After playing around some, I found that I can maybe work around the issue
by redefining OptionalUnionHolder with a new name:
https://pastebin.com/92y4WYdR (Exhibit D; note line 36 in pastebin)

I'd rather not need to do that, since it'd be a bigger change to our
existing default-less schemas, as well as the code we use to encode/decode
data alongside them. I haven't looked into this yet, but I assume that as
the number of self-references with a default increases, the number of times
I'd have to redefine the same schema will increase as well.

On 1.12.0 (but not 1.11.4), I was able to change where in the schema I
defined Bar, the type that has the field with the default, and this parsed
successfully: https://pastebin.com/mHbSvxd6 (Exhibit E; note lines 23 / 32
in pastebin)

Unfortunately, the test case I've been using so far is an
oversimplification of what I need. If we add "default": { "value": null
} to another field that's been in this test case, parsing fails with the
same error (both on 1.12.0 and 1.11.4): https://pastebin.com/4JKfAJCN (Exhibit
F; note lines 14-16 in pastebin)

We also looked into .setValidateDefaults(false) on Schema.Parser, which
does avoid the issue. However, when ResolvingDecoder needs to work with
schemas like this, it will hit the error, and the Decoder APIs don't expose
a way to run with default validation disabled. Intentionally, I assume.

Do you consider this behavior a bug, or is this working as designed? Are
there any simpler workarounds I'm not seeing? I'd appreciate any help you
can provide.

Thanks,
Zoey

Exhibit A: Original, working schema

{
  "name": "UnionHolder",
  "type": "record",
  "fields": [
    {
      "name": "value",
      "type": [
        {
          "name": "Foo",
          "type": "record",
          "fields": [
            {
              "name": "fooChildMaybe",
              "type": {
                "name": "OptionalUnionHolder",
                "type": "record",
                "fields": [
                  {
                    "name": "value",
                    "type": [
                      "null",
                      "Foo",
                      {
                        "name": "Bar",
                        "type": "record",
                        "fields": [
                          {
                            "name": "barChildMaybe",
                            "type": "OptionalUnionHolder"
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        "Bar"
      ]
    }
  ]
}

Exhibit B: Add "default": { "value": null }, schema is now unparseable

{
  "name": "UnionHolder",
  "type": "record",
  "fields": [
    {
      "name": "value",
      "type": [
        {
          "name": "Foo",
          "type": "record",
          "fields": [
            {
              "name": "fooChildMaybe",
              "type": {
                "name": "OptionalUnionHolder",
                "type": "record",
                "fields": [
                  {
                    "name": "value",
                    "type": [
                      "null",
                      "Foo",
                      {
                        "name": "Bar",
                        "type": "record",
                        "fields": [
                          {
                            "name": "barChildMaybe",
                            "default": {
                              "value": null
                            },
                            "type": "OptionalUnionHolder"
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        "Bar"
      ]
    }
  ]
}

Exhibit C: Stack trace

org.apache.avro.AvroRuntimeException: Schema fields not set yet

at org.apache.avro.Schema$RecordSchema.getFields(Schema.java:955)
at org.apache.avro.Schema.isValidDefault(Schema.java:1789)
at org.apache.avro.Schema.isValidDefault(Schema.java:1747)
at org.apache.avro.Schema.validateDefault(Schema.java:1717)
at org.apache.avro.Schema$Field.<init>(Schema.java:578)
at org.apache.avro.Schema.parseField(Schema.java:1903)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1835)
at org.apache.avro.Schema.parseUnion(Schema.java:1971)
at org.apache.avro.Schema.parse(Schema.java:1848)
at org.apache.avro.Schema.parseField(Schema.java:1891)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1835)
at org.apache.avro.Schema.parseField(Schema.java:1891)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1835)
at org.apache.avro.Schema.parseUnion(Schema.java:1971)
at org.apache.avro.Schema.parse(Schema.java:1848)
at org.apache.avro.Schema.parseField(Schema.java:1891)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1835)
at org.apache.avro.Schema$Parser.parse(Schema.java:1538)
at org.apache.avro.Schema$Parser.parse(Schema.java:1515)
at <the place where we call new Schema.Parser().parse(theSchema)>

Exhibit D: Workaround?

{
  "name": "UnionHolder",
  "type": "record",
  "fields": [
    {
      "name": "value",
      "type": [
        {
          "name": "Foo",
          "type": "record",
          "fields": [
            {
              "name": "fooChildMaybe",
              "type": {
                "name": "OptionalUnionHolder",
                "type": "record",
                "fields": [
                  {
                    "name": "value",
                    "type": [
                      "null",
                      "Foo",
                      {
                        "name": "Bar",
                        "type": "record",
                        "fields": [
                          {
                            "name": "barChildMaybe",
                            "default": {
                              "value": null
                            },
                            "type": {
                              "name": "RedefinedOptionalUnionHolder",
                              "type": "record",
                              "fields": [
                                {
                                  "name": "value",
                                  "type": [
                                    "null",
                                    "Foo",
                                    "Bar"
                                  ]
                                }
                              ]
                            }
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        "Bar"
      ]
    }
  ]
}

Exhibit E: Define Bar at its first appearance breadth-wise rather than
depth-wise (this parses)

{
  "name": "UnionHolder",
  "type": "record",
  "fields": [
    {
      "name": "value",
      "type": [
        {
          "name": "Foo",
          "type": "record",
          "fields": [
            {
              "name": "fooChildMaybe",
              "type": {
                "name": "OptionalUnionHolder",
                "type": "record",
                "fields": [
                  {
                    "name": "value",
                    "type": [
                      "null",
                      "Foo",
                      "Bar"
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "name": "Bar",
          "type": "record",
          "fields": [
            {
              "name": "barChildMaybe",
              "default": {
                "value": null
              },
              "type": "OptionalUnionHolder"
            }
          ]
        }
      ]
    }
  ]
}

Exhibit F: like E, but fooChildMaybe and barChildMaybe both have "default":
{ "value": null } (this fails)

{
  "name": "UnionHolder",
  "type": "record",
  "fields": [
    {
      "name": "value",
      "type": [
        {
          "name": "Foo",
          "type": "record",
          "fields": [
            {
              "name": "fooChildMaybe",
              "default": {
                "value": null
              },
              "type": {
                "name": "OptionalUnionHolder",
                "type": "record",
                "fields": [
                  {
                    "name": "value",
                    "type": [
                      "null",
                      "Foo",
                      "Bar"
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "name": "Bar",
          "type": "record",
          "fields": [
            {
              "name": "barChildMaybe",
              "default": {
                "value": null
              },
              "type": "OptionalUnionHolder"
            }
          ]
        }
      ]
    }
  ]
}

Reply via email to