Hi Michael, Thanks a lot for your suggestions, now I understand your idea of using your schema checking method as a starting point for defining a method for modifying an schema by traversing it. It will definitely take a look to that approach. I will also try with Avro Schema IDL.
Thanks again for your help! Greetings, Juan 2014-08-20 20:52 GMT+02:00 Michael Pigott <[email protected]>: > Hi Juan! > > I originally considered showing you the AvroSchemaGenerator, but I thought > it was a bit complex and very specific to XML Schema itself. I think you > would have better luck understanding how either Protobuf or Thrift schemas > are converted to Avro instead, as those are more generic, and the feature > set more closely maps to Avro. > > To answer your question, I never was able to find a use case where > creating an Avro schema from only a list of fields worked for me. That was > okay in my case, because I could just use the corresponding XML element > name and namespace when creating the record. You might have better luck, > depending on your use case? > > I unfortunately do not know of an existing tool that solves your problem, > and I poked around the existing code and JIRA tickets for a bit and came up > empty. I originally thought you could write a clone function yourself, and > create a new schema as you recursively descend through the old one, adding > in any changes you wanted to make along the way. (The comparison tool I > showed you would make a good template.) > > That said, you might have better luck using the Avro Schema IDL[1], rather > than rolling your own? > > Good luck! > Mike > > [1] http://avro.apache.org/docs/1.7.7/idl.html > > > On Wed, Aug 20, 2014 at 3:19 AM, Juan Rodríguez Hortalá < > [email protected]> wrote: > >> Hi Michael, >> >> Thanks a lot for your suggestion. I've found particularly interesting the >> class >> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/AvroSchemaGenerator.java, >> which I understand generates an Avro schema by visiting an XML document. I >> assume that you have used a fresh name for record in the node, otherwise >> maybe you had encountere problems like the following: starting from an >> Schema object 'personSchema' containing the following schema: >> >> { >> "type" : "record", >> "name" : "Person", >> "namespace" : "test", >> "doc" : "Schema for test.SchemasTest$Person", >> "fields" : [ { >> "name" : "age", >> "type" : "int" >> }, { >> "name" : "name", >> "type" : [ "null", "string" ] >> } ] >> } >> >> The following code works ok >> >> Schema twoPersons = Schema.createRecord( Arrays.asList( new >> Schema.Field(personSchema.getName() + "_1", personSchema, personSchema. >> getDoc() + " _1", null), new Schema.Field(personSchema.getName() >> + "_2", personSchema, personSchema.getDoc() + " _2", null) ) ); >> >> but when I use the new Schema object twoPersons it's pretty easy to >> encounter an exception, for example: >> >> System.out.println(new Schema.Parser().setValidate(true).parse( >> twoPersons.toString())) >> throws >> >> org.apache.avro.SchemaParseException: No name in schema: >> {"type":"record","fields":[{"name":"Person_1","type":{"type":"record","name":"Person","namespace":"test","doc":"Schema >> for >> test.SchemasTest$Person","fields":[{"name":"age","type":"int"},{"name":"name","type":["null","string"]}]},"doc":"Schema >> for test.SchemasTest$Person >> _1"},{"name":"Person_2","type":"test.Person","doc":"Schema for >> test.SchemasTest$Person _2"}]} >> at org.apache.avro.Schema.getRequiredText(Schema.java:1221) >> at org.apache.avro.Schema.parse(Schema.java:1092) >> at org.apache.avro.Schema$Parser.parse(Schema.java:953) >> at org.apache.avro.Schema$Parser.parse(Schema.java:943) >> at >> com.lambdoop.sdk.core.SchemasTest.createRecordFailTest(SchemasTest.java:232) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) >> at >> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) >> at >> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) >> at >> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) >> at >> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) >> at >> org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) >> at >> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) >> at >> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) >> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) >> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) >> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) >> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) >> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) >> at org.junit.runners.ParentRunner.run(ParentRunner.java:236) >> at >> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) >> at >> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) >> >> >> Adding the name with twoPersons.addProp("name", "twoPersons") doesn't >> work because "name" is a reserved property. SchemaBuilder cannot be used >> either because it doesn't allow adding Schema objects to a field, but just >> creating schemas from scratch. >> >> Other problem I have is that when I convert the schemas to Jackson's >> JsonNode, and starting from an empty schema like >> >> { >> "type" : "record", >> "name" : "Person", >> "namespace" : "test", >> "fields" : [ ] >> } >> >> if I add a field with schema Person by manipulating the JsonNode, when I >> convert back to an Avro Schema object I get a "Can't redefine: >> test.Person". My conclusions then are: >> - every record needs to have a name >> - two records with the same name must have the same schema >> >> That is not very surprising as it corresponds to what it's specified in >> http://avro.apache.org/docs/current/spec.html. I was wondering If anyone >> knows about a library for transforming Avro schemas that is able of doing >> things like adding an existing schema as new field of another schema, that >> has already dealt with these details. >> >> Thanks a lot for your help, >> >> Greetings, >> >> Juan Rodríguez >> >> >> >> >> >> >> 2014-08-19 7:04 GMT-07:00 Michael Pigott <[email protected] >> >: >> >> Hi Juan, >>> That sounds really complex. Would you instead be able to build or >>> retrieve the original Avro Schema objects, and then build a new Schema from >>> its definition? For my work on transforming XML to Avro and back[1], I >>> wrote a comparison tool to confirm that two Avro Schemas are equivalent by >>> recursively descending through both schemas[2]. Perhaps you can use >>> something similar to build a transformed Avro schema in memory, by applying >>> your transformations on the fly? >>> >>> Good luck! >>> Mike >>> >>> [1] https://issues.apache.org/jira/browse/AVRO-457 >>> [2] >>> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/test/java/org/apache/avro/xml/UtilsForTests.java >>> >>> >>> On Tue, Aug 19, 2014 at 2:23 AM, Juan Rodríguez Hortalá < >>> [email protected]> wrote: >>> >>>> Hi list, >>>> >>>> I'm working on a project in Java where we have a DSL working on >>>> GenericRecord objects, over which we define record transformation >>>> operations like projections, filters and so. This implies that the avro >>>> schema of the records evolves by adding and deleting record fields. As a >>>> result the avro schemas used are different in each program depending on the >>>> operations used. Hence I have to define avro schema transformations, and >>>> generate new schemas as modifications of other schemas. For that the avro >>>> schema builder classes are only useful for the starting schema, and so does >>>> a pojo to schema mapping like avro-jackson. The main problem I face is that >>>> in avro by design "schema objects are logically immutable", as stated in >>>> the documentation. So far I've taken the way of converting the schema to >>>> string, parsing it with jackson and manipulate it's representation as >>>> JsonNode, and then parsing it back to Avro. In that latter step I sometimes >>>> have problems because avro records are named, and anonymous records are not >>>> always legal in complete schemas; or because the same record name cannot be >>>> used twice in two child fields of a parent record. I was then thinking in >>>> using generated schema names, with an increasing ID or a random UUID. >>>> Anyway my question is, the approach I'm describing is correct?, are you >>>> aware of some library for creating new avro schemas by manipulating an >>>> input schema? Maybe that capabilities are already present in avro's Java >>>> API but I haven't noticed. >>>> >>>> Any help with be welcome. Thanks a lot in advance >>>> >>>> Greetings, >>>> >>>> Juan Rodríguez Hortalá >>>> >>> >>> >> >
