Hi Michael,

Thanks a lot for your suggestion. I've found particularly interesting the
class
https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/AvroSchemaGenerator.java,
which I understand generates an Avro schema by visiting an XML document. I
assume that you have used a fresh name for record in the node, otherwise
maybe you had encountere problems like the following: starting from an
Schema object 'personSchema' containing the following schema:

{
  "type" : "record",
  "name" : "Person",
  "namespace" : "test",
  "doc" : "Schema for test.SchemasTest$Person",
  "fields" : [ {
    "name" : "age",
    "type" : "int"
  }, {
    "name" : "name",
    "type" : [ "null", "string" ]
  } ]
}

The following code works ok

Schema twoPersons = Schema.createRecord(      Arrays.asList(         new
Schema.Field(personSchema.getName() + "_1", personSchema, personSchema.
getDoc() + " _1", null),         new Schema.Field(personSchema.getName() +
"_2", personSchema, personSchema.getDoc() + " _2", null)       )  );

but when I use the new Schema object twoPersons it's pretty easy to
encounter an exception, for example:

    System.out.println(new Schema.Parser().setValidate(true).parse(
twoPersons.toString()))
throws

org.apache.avro.SchemaParseException: No name in schema:
{"type":"record","fields":[{"name":"Person_1","type":{"type":"record","name":"Person","namespace":"test","doc":"Schema
for
test.SchemasTest$Person","fields":[{"name":"age","type":"int"},{"name":"name","type":["null","string"]}]},"doc":"Schema
for test.SchemasTest$Person
_1"},{"name":"Person_2","type":"test.Person","doc":"Schema for
test.SchemasTest$Person _2"}]}
    at org.apache.avro.Schema.getRequiredText(Schema.java:1221)
    at org.apache.avro.Schema.parse(Schema.java:1092)
    at org.apache.avro.Schema$Parser.parse(Schema.java:953)
    at org.apache.avro.Schema$Parser.parse(Schema.java:943)
    at
com.lambdoop.sdk.core.SchemasTest.createRecordFailTest(SchemasTest.java:232)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    at
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
    at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
    at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)


Adding the name with twoPersons.addProp("name", "twoPersons") doesn't work
because "name" is a reserved property. SchemaBuilder cannot be used either
because it doesn't allow adding Schema objects to a field, but just
creating schemas from scratch.

Other problem I have is that when I convert the schemas to Jackson's
JsonNode, and starting from an empty schema like

{
  "type" : "record",
  "name" : "Person",
  "namespace" : "test",
  "fields" : [ ]
}

if I add a field with schema Person by manipulating the JsonNode, when I
convert back to an Avro Schema object I get a "Can't redefine:
test.Person". My conclusions then are:
- every record needs to have a name
- two records with the same name must have the same schema

That is not very surprising as it corresponds to what it's specified in
http://avro.apache.org/docs/current/spec.html. I was wondering If anyone
knows about a library for transforming Avro schemas that is able of doing
things like adding an existing schema as new field of another schema, that
has already dealt with these details.

Thanks a lot for your help,

Greetings,

Juan Rodríguez






2014-08-19 7:04 GMT-07:00 Michael Pigott <[email protected]>:

> Hi Juan,
>     That sounds really complex.  Would you instead be able to build or
> retrieve the original Avro Schema objects, and then build a new Schema from
> its definition?  For my work on transforming XML to Avro and back[1], I
> wrote a comparison tool to confirm that two Avro Schemas are equivalent by
> recursively descending through both schemas[2].  Perhaps you can use
> something similar to build a transformed Avro schema in memory, by applying
> your transformations on the fly?
>
> Good luck!
> Mike
>
> [1] https://issues.apache.org/jira/browse/AVRO-457
> [2]
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/test/java/org/apache/avro/xml/UtilsForTests.java
>
>
> On Tue, Aug 19, 2014 at 2:23 AM, Juan Rodríguez Hortalá <
> [email protected]> wrote:
>
>> Hi list,
>>
>> I'm working on a project in Java where we have a DSL working on
>> GenericRecord objects, over which we define record transformation
>> operations like projections, filters and so. This implies that the avro
>> schema of the records evolves by adding and deleting record fields. As a
>> result the avro schemas used are different in each program depending on the
>> operations used. Hence I have to define avro schema transformations, and
>> generate new schemas as modifications of other schemas. For that the avro
>> schema builder classes are only useful for the starting schema, and so does
>> a pojo to schema mapping like avro-jackson. The main problem I face is that
>> in avro by design "schema objects are logically immutable", as stated in
>> the documentation. So far I've taken the way of converting the schema to
>> string, parsing it with jackson and manipulate it's representation as
>> JsonNode, and then parsing it back to Avro. In that latter step I sometimes
>> have problems because avro records are named, and anonymous records are not
>> always legal in complete schemas; or because the same record name cannot be
>> used twice in two child fields of a parent record. I was then thinking in
>> using generated schema names, with an increasing ID or a random UUID.
>> Anyway my question is, the approach I'm describing is correct?,  are you
>> aware of some library for creating new avro schemas by manipulating an
>> input schema? Maybe that capabilities are already present in avro's Java
>> API but I haven't noticed.
>>
>> Any help with be welcome. Thanks a lot in advance
>>
>> Greetings,
>>
>> Juan Rodríguez Hortalá
>>
>
>

Reply via email to