Hi Doug,
Interestingly I was (sort of) able to make this work. Here's an example
schema that correctly generates a class with a field of type
com.mediamath.data.util.Timestamp (my own Timestamp implementation with a
single String constructor).
{
"namespace" : "com.mediamath.data.bidder",
"type" : "record",
"name" : "Impression",
"fields" : [
{ "name" : "batchId", "type" : "long" },
{ "name" : "auctionId", "type" : "long" },
{ "name" : "timestamp", "type" : {
"type" : "string", "java-class" :
"com.mediamath.data.util.Timestamp" }
},
...
}
NOTE the subtle difference in the field declaration from the previous
attempt. This actually produces the Java class I was hoping for
public class Impression extends org.apache.avro.specific.SpecificRecordBase
implements org.apache.avro.specific.SpecificRecord {
public static final org.apache.avro.Schema SCHEMA$ = ...
@Deprecated public long batchId;
@Deprecated public long auctionId;
@Deprecated public com.mediamath.data.util.Timestamp timestamp;
...
Here's my Timestamp class (Scala)
case class Timestamp(s: String) {
val instant = Timestamp.fromString(s)
override def toString: String = Timestamp.toString(instant)
}
So the issue I'm running into now is trying to serialize those instances to
a file. Working in Scala, here's the code I'm using:
val schema = Impression.getClassSchema
val datumWriter = new SpecificDatumWriter(classOf[Impression])
val dataFileWriter = new DataFileWriter(datumWriter)
dataFileWriter.create(schema, new File("target/avro-test.avro"))
dataFileWriter.append(imp)
dataFileWriter.close()
I get an exception:
java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be
cast to java.lang.CharSequence
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be
cast to java.lang.CharSequence
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
at
com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:67)
at
com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:50)
Caused by: java.lang.ClassCastException: com.mediamath.data.util.Timestamp
cannot be cast to java.lang.CharSequence
at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:213)
at
org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:69)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:76)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 2 more
Ok, what if I add @Stringable to Timestamp's constructor? It still doesn't
work... The issue is in SpecificData
protected Set<Class> stringableClasses = new HashSet<Class>();
{
stringableClasses.add(java.math.BigDecimal.class);
stringableClasses.add(java.math.BigInteger.class);
stringableClasses.add(java.net.URI.class);
stringableClasses.add(java.net.URL.class);
stringableClasses.add(java.io.File.class);
}
It seems that only a small number of classes are allowed, and there is no
simple way to extend the list. My workaround is to do something like this
(Scala again):
val sd = new SpecificData {
override def isStringable(c: Class[_]): Boolean = {
if (c.isAssignableFrom(classOf[Timestamp])) true
else super.isStringable(c)
}
}
val schema = Impression.getClassSchema
val datumWriter = new SpecificDatumWriter[Impression](sd) { }
val dataFileWriter = new DataFileWriter[Impression](datumWriter)
dataFileWriter.create(schema, new File("target/avro-test.avro"))
dataFileWriter.append(imp)
dataFileWriter.close()
That works! And the serialized objects can even be read back from e.g.
Python as a String:
$ python test.py
{... u'publisherTagId': None, u'strategyId': 405963, u'creativeId': 671347,
u'timestamp': u'2014-05-13 00:35:00' ...}
On Thu, Jul 3, 2014 at 2:14 PM, Doug Cutting <[email protected]> wrote:
> The java-class attribute is supported by the reflect implementation,
> not by the code-generating specific implementation. So you could
> define Foo in Java with something like:
>
> public class Foo {
> private long batchId;
> @Stringable private Timestamp timestamp;
> public Foo() {}
> public Foo(long batchId, Timestamp timestamp) { ... }
> }
>
> then use ReflectData to read/write instances. Note that
> java.sql.Timestamp doesn't have a string constructor. Are you using a
> different timestamp class? If you're defining your own then you could
> instead add the @Stringable annotation to your Timestamp class rather
> than to each field where it is used.
>
> Reflect-defined schemas can refer to specific-defined classes, but not
> vice-versa, since the compiler doesn't use reflection to discover
> schemas, but rather always generates from the schema alone.
>
> Doug
>
> On Wed, Jul 2, 2014 at 8:05 AM, Ian Hummel <[email protected]> wrote:
> > Hi gang,
> >
> > I'm trying to build a JSON schema with a custom type as the field
> instead of
> > just a String. Is "java-class" supposed to work in that use case? I
> can't
> > seem to make any progress.
> >
> > Example schema (Foo.avsc):
> >
> > {
> > "namespace" : "com.example",
> > "type" : "record",
> > "name" : "Foo",
> > "fields" : [
> > { "name" : "batchId", "type" : "long" },
> > { "name" : "timestamp", "type" : "string", "java-class" :
> > "com.example.Timestamp" }
> > ]
> > }
> >
> > The Timestamp class has a public constructor which takes a single String
> > argument. I even tried annotating it with @Stringable. However, the
> > generated java class always uses String, not my custom type.
> >
> > $ java -jar ~/Downloads/avro-tools-1.7.6.jar compile -string schema
> > src/main/avro/Foo.avsc /tmp/foo
> >
> > From the generated .java file
> >
> > ...
> >
> > /**
> >
> > * All-args constructor.
> >
> > */
> >
> > public Foo(java.lang.Long batchId, java.lang.String timestamp) {
> >
> > this.batchId = batchId;
> >
> > this.timestamp = timestamp;
> >
> > }
> >
> > ...
> >
> >
> > Any help appreciated,
> >
> > - Ian.
>