Hi Stanislav, Thanks for the reply. What I want to achieve is that data arriving in Avro writer may not contain all field as specified in the example above. I would like to save default value if possible or retrieve the default value when using DataFileReader. Is this possible? Should the data always contain all the keys specified in the schema. I tried using ["int", "null"], "default" : 0, but this was able to save the data if any field is not present, but using DataFileReader I got None instead of default value 0. Any help will be much appreciated. Thanks.
On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <[email protected]> wrote: > Hi, > > I believe default values only work for readers, not writers. > > Spec says that (http://avro.apache.org/docs/current/spec.html): > > default: A default value for this field, used when reading instances > that lack this field (optional). > > On 7 июля 2016 г., at 21:16, Sarvagya Pant <[email protected]> > wrote: > > I am trying to implement Avro to replace some codes that tries to write > data in CSV. This is because CSV cannot store the type of the field and all > data are treated as string when trying to consume. I have copied the code > for Avro from its website and would like to set a default value if there is > no field. > > My avro file looks like this: > > { > "type" : "record", > "name" : "data", > "namespace" : "my.example", > "fields" : [ > {"name" : "domain", "type" : "string", "default" : "EMPTY"}, > {"name" : "ip", "type" : "string", "default" : "EMPTY"}, > {"name" : "port", "type" : "int", "default" : 0}, > {"name" : "score", "type" : "int", "default" : 0} > ] > } > > I have written a simple python file that is expected to work. It is given > below: > > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > > schema = avro.schema.parse(open("data.avsc", "rb").read()) > > writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema) > writer.append({"domain": "hello domain", "score" : 20, "port" : 8080}) > writer.append({"ip": "1.2.3.4", "port" : 80}) > writer.append({"domain": "another domain", "score" : 100}) > writer.close() > > reader = DataFileReader(open("users.avro", "rb"), DatumReader()) > for data in reader: > print data > reader.close() > > However, if I try to run this program, I get error that data are not > mapped according to schema. > > Traceback (most recent call last): > File "D:\arko.py", line 8, in <module> > writer.append({"domain": "hello domain", "score" : 20, "port" : 8080}) > File "build\bdist.win32\egg\avro\datafile.py", line 196, in append > File "build\bdist.win32\egg\avro\io.py", line 769, in write > > avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': > 20, 'port': 8080} is not an example of the schema { > "namespace": "my.example", > "type": "record", > "name": "userInfo", > "fields": [ > { > "default": "EMPTY", > "type": "string", > "name": "domain" > }, > { > "default": "EMPTY", > "type": "string", > "name": "ip" > }, > { > "default": 0, > "type": "int", > "name": "port" > }, > { > "default": 0, > "type": "int", > "name": "score" > } > ] > } > [Finished in 0.1s with exit code 1] > > I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks. > > -- > > *Sarvagya Pant* > *Kathmandu, Nepal* > > > -- *Sarvagya Pant* *Kathmandu, Nepal*
