i did a store to figure out how to write the schema in json and then used
that as a template to create a schema for load.
from my experiments, for data with three columns (int, charray, float) i
figured this is the minimal schema
{"fields":
[
{"name":"year","type":10},
{"name":"name","type":55},
{"name":"num","type":20}
]
}
is there any literature on how to write proper json for schemas?
thanks
vkh
Sadly, there isn't. For a simple, flat schema, it isn't hard. You just have to
add another field, with its name, and corresponding DataType:
http://pig.apache.org/docs/r0.10.0/api/constant-values.html#org.apache.pig.data.DataType.GENERIC_WRITABLECOMPARABLE
For a more complex schema, it's easier to actually construct a ResourceSchema
object and serialize it with Jackson:
http://pig.apache.org/docs/r0.10.0/api/index.html?org/apache/pig/ResourceSchema.html
Regards,
Marcos