Hit send to fast! My real question is: is this a supported use case? Is there no way to make SpecificData at least aware of other @Stringale types? Maybe even just exposing some methods to "register" new @Stringables or even unannotated types which have a single String argument constructor + toString method?
Cheers, On Sat, Jul 5, 2014 at 12:07 PM, Ian Hummel <[email protected]> wrote: > Hi Doug, > > Interestingly I was (sort of) able to make this work. Here's an example > schema that correctly generates a class with a field of type > com.mediamath.data.util.Timestamp (my own Timestamp implementation with a > single String constructor). > > { > "namespace" : "com.mediamath.data.bidder", > "type" : "record", > "name" : "Impression", > "fields" : [ > { "name" : "batchId", "type" : "long" }, > { "name" : "auctionId", "type" : "long" }, > { "name" : "timestamp", "type" : { > "type" : "string", "java-class" : > "com.mediamath.data.util.Timestamp" } > }, > ... > } > > NOTE the subtle difference in the field declaration from the previous > attempt. This actually produces the Java class I was hoping for > > public class Impression extends > org.apache.avro.specific.SpecificRecordBase implements > org.apache.avro.specific.SpecificRecord { > public static final org.apache.avro.Schema SCHEMA$ = ... > @Deprecated public long batchId; > @Deprecated public long auctionId; > @Deprecated public com.mediamath.data.util.Timestamp timestamp; > ... > > Here's my Timestamp class (Scala) > > case class Timestamp(s: String) { > val instant = Timestamp.fromString(s) > override def toString: String = Timestamp.toString(instant) > } > > So the issue I'm running into now is trying to serialize those instances > to a file. Working in Scala, here's the code I'm using: > > val schema = Impression.getClassSchema > val datumWriter = new SpecificDatumWriter(classOf[Impression]) > val dataFileWriter = new DataFileWriter(datumWriter) > dataFileWriter.create(schema, new File("target/avro-test.avro")) > dataFileWriter.append(imp) > dataFileWriter.close() > > I get an exception: > > java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be > cast to java.lang.CharSequence > org.apache.avro.file.DataFileWriter$AppendWriteException: > java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be > cast to java.lang.CharSequence > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296) > at > com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:67) > at > com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:50) > Caused by: java.lang.ClassCastException: com.mediamath.data.util.Timestamp > cannot be cast to java.lang.CharSequence > at > org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:213) > at > org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:69) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:76) > at > org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) > at > org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) > ... 2 more > > Ok, what if I add @Stringable to Timestamp's constructor? It still > doesn't work... The issue is in SpecificData > > protected Set<Class> stringableClasses = new HashSet<Class>(); > { > stringableClasses.add(java.math.BigDecimal.class); > stringableClasses.add(java.math.BigInteger.class); > stringableClasses.add(java.net.URI.class); > stringableClasses.add(java.net.URL.class); > stringableClasses.add(java.io.File.class); > } > > It seems that only a small number of classes are allowed, and there is no > simple way to extend the list. My workaround is to do something like this > (Scala again): > > val sd = new SpecificData { > override def isStringable(c: Class[_]): Boolean = { > if (c.isAssignableFrom(classOf[Timestamp])) true > else super.isStringable(c) > } > } > val schema = Impression.getClassSchema > val datumWriter = new SpecificDatumWriter[Impression](sd) { } > val dataFileWriter = new DataFileWriter[Impression](datumWriter) > dataFileWriter.create(schema, new File("target/avro-test.avro")) > dataFileWriter.append(imp) > dataFileWriter.close() > > That works! And the serialized objects can even be read back from e.g. > Python as a String: > > $ python test.py > {... u'publisherTagId': None, u'strategyId': 405963, u'creativeId': > 671347, u'timestamp': u'2014-05-13 00:35:00' ...} > > > > > > On Thu, Jul 3, 2014 at 2:14 PM, Doug Cutting <[email protected]> wrote: > >> The java-class attribute is supported by the reflect implementation, >> not by the code-generating specific implementation. So you could >> define Foo in Java with something like: >> >> public class Foo { >> private long batchId; >> @Stringable private Timestamp timestamp; >> public Foo() {} >> public Foo(long batchId, Timestamp timestamp) { ... } >> } >> >> then use ReflectData to read/write instances. Note that >> java.sql.Timestamp doesn't have a string constructor. Are you using a >> different timestamp class? If you're defining your own then you could >> instead add the @Stringable annotation to your Timestamp class rather >> than to each field where it is used. >> >> Reflect-defined schemas can refer to specific-defined classes, but not >> vice-versa, since the compiler doesn't use reflection to discover >> schemas, but rather always generates from the schema alone. >> >> Doug >> >> On Wed, Jul 2, 2014 at 8:05 AM, Ian Hummel <[email protected]> wrote: >> > Hi gang, >> > >> > I'm trying to build a JSON schema with a custom type as the field >> instead of >> > just a String. Is "java-class" supposed to work in that use case? I >> can't >> > seem to make any progress. >> > >> > Example schema (Foo.avsc): >> > >> > { >> > "namespace" : "com.example", >> > "type" : "record", >> > "name" : "Foo", >> > "fields" : [ >> > { "name" : "batchId", "type" : "long" }, >> > { "name" : "timestamp", "type" : "string", "java-class" : >> > "com.example.Timestamp" } >> > ] >> > } >> > >> > The Timestamp class has a public constructor which takes a single String >> > argument. I even tried annotating it with @Stringable. However, the >> > generated java class always uses String, not my custom type. >> > >> > $ java -jar ~/Downloads/avro-tools-1.7.6.jar compile -string schema >> > src/main/avro/Foo.avsc /tmp/foo >> > >> > From the generated .java file >> > >> > ... >> > >> > /** >> > >> > * All-args constructor. >> > >> > */ >> > >> > public Foo(java.lang.Long batchId, java.lang.String timestamp) { >> > >> > this.batchId = batchId; >> > >> > this.timestamp = timestamp; >> > >> > } >> > >> > ... >> > >> > >> > Any help appreciated, >> > >> > - Ian. >> > >
