I have tried both cases.
Case 1: using ["int", "null"], "default" : 0. In this case writing a sample
data {"ip" : "1.1.1.1", "port" : 20} in DataFileWriter yeilded {u'ip':
u'1.1.1.1', u'domain': None, u'score': None, u'port': 20} where I expected
score to be 0 instead of None.
Case 2: using ["null", "int"], "default": null : Same as above case was
seen.Is this a limitation in Python library of avro? If so, can someone recommend me any other python based library similar to avro. Thanks. On Fri, Jul 8, 2016 at 7:02 PM, Arne Vogel <[email protected]> wrote: > Dear Yibing Shi, > > a default value for a union must have the schema which is the first union > member. Therefore, to set e.g. an int default value, use ["int", "null"] > instead of ["null", "int"]. > > For more details, see the spec: > http://avro.apache.org/docs/1.8.1/spec.html#schema_complex > > Regards, > Arne Vogel > > > On 08.07.2016 14:51, Yibing Shi wrote: > > + Sean Busbey > > My understanding is this problem is a limitation of Python AVRO library. > Currently it seems that the only valid default value is "null". Please try > below schema to see whether it works for you. > > { > * "type" : "record",* > * "name" : "data",* > * "namespace" : "my.example",* > * "fields" : [* > * {"name" : "domain", "type" : ["null", "string"], "default" : > null},* > * {"name" : "ip", "type" : ["null", "string"], "default" : null},* > * {"name" : "port", "type" : ["null", "int"], "default" : null},* > * {"name" : "score", "type" : ["null", "int"], "default" : null}* > * ]* > *}* > > Below JIRAs seems to be related: > > https://issues.apache.org/jira/browse/AVRO-1265 > https://issues.apache.org/jira/browse/AVRO-1566 > > I am pretty sure that the AVRO Java library supports using a non-null > default value for record fields. You can try it in a Java program. > > > *Yibing Shi* > *Customer Operations Engineer* > <http://www.cloudera.com> > > On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik < > <[email protected]>[email protected]> wrote: > >> I'm not familiar with Avro good enough to propose an "Avro solution" for >> your problem :( >> >> If you want to serialize default values into Avro for some fields you >> should provide the default values in code explicitly when writing to Avro. >> Another approach is to declare the fields as nullable using union types >> (e.g. [null, int]) and use default values in code explicitly when reading >> from Avro. >> >> I believe the "default" key you used in Avro schema is meant for schema >> evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution >> >> >> - if the reader's record schema has a field that contains a default >> value, and writer's schema does not have a field with the same name, then >> the reader should use the default value from its field. >> >> >> пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[email protected]>: >> >>> Hi Stanislav, >>> >>> Thanks for the reply. What I want to achieve is that data arriving in >>> Avro writer may not contain all field as specified in the example above. I >>> would like to save default value if possible or retrieve the default value >>> when using DataFileReader. Is this possible? Should the data always contain >>> all the keys specified in the schema. I tried using ["int", "null"], >>> "default" : 0, but this was able to save the data if any field is not >>> present, but using DataFileReader I got None instead of default value 0. >>> Any help will be much appreciated. Thanks. >>> >>> On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I believe default values only work for readers, not writers. >>>> >>>> Spec says that ( <http://avro.apache.org/docs/current/spec.html> >>>> http://avro.apache.org/docs/current/spec.html): >>>> > default: A default value for this field, used when reading instances >>>> that lack this field (optional). >>>> >>>> On 7 июля 2016 г., at 21:16, Sarvagya Pant < <[email protected]> >>>> [email protected]> wrote: >>>> >>>> I am trying to implement Avro to replace some codes that tries to write >>>> data in CSV. This is because CSV cannot store the type of the field and all >>>> data are treated as string when trying to consume. I have copied the code >>>> for Avro from its website and would like to set a default value if there is >>>> no field. >>>> >>>> My avro file looks like this: >>>> >>>> { >>>> "type" : "record", >>>> "name" : "data", >>>> "namespace" : "my.example", >>>> "fields" : [ >>>> {"name" : "domain", "type" : "string", "default" : "EMPTY"}, >>>> {"name" : "ip", "type" : "string", "default" : "EMPTY"}, >>>> {"name" : "port", "type" : "int", "default" : 0}, >>>> {"name" : "score", "type" : "int", "default" : 0} >>>> ] >>>> } >>>> >>>> I have written a simple python file that is expected to work. It is >>>> given below: >>>> >>>> import avro.schema >>>> from avro.datafile import DataFileReader, DataFileWriter >>>> from avro.io import DatumReader, DatumWriter >>>> >>>> schema = avro.schema.parse(open("data.avsc", "rb").read()) >>>> >>>> writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema) >>>> writer.append({"domain": "hello domain", "score" : 20, "port" : 8080}) >>>> writer.append({"ip": "1.2.3.4", "port" : 80}) >>>> writer.append({"domain": "another domain", "score" : 100}) >>>> writer.close() >>>> >>>> reader = DataFileReader(open("users.avro", "rb"), DatumReader()) >>>> for data in reader: >>>> print data >>>> reader.close() >>>> >>>> However, if I try to run this program, I get error that data are not >>>> mapped according to schema. >>>> >>>> Traceback (most recent call last): >>>> File "D:\arko.py", line 8, in <module> >>>> writer.append({"domain": "hello domain", "score" : 20, "port" : >>>> 8080}) >>>> File "build\bdist.win32\egg\avro\datafile.py", line 196, in append >>>> File "build\bdist.win32\egg\avro\io.py", line 769, in write >>>> >>>> avro.io.AvroTypeException: The datum {'domain': 'hello domain', >>>> 'score': 20, 'port': 8080} is not an example of the schema { >>>> "namespace": "my.example", >>>> "type": "record", >>>> "name": "userInfo", >>>> "fields": [ >>>> { >>>> "default": "EMPTY", >>>> "type": "string", >>>> "name": "domain" >>>> }, >>>> { >>>> "default": "EMPTY", >>>> "type": "string", >>>> "name": "ip" >>>> }, >>>> { >>>> "default": 0, >>>> "type": "int", >>>> "name": "port" >>>> }, >>>> { >>>> "default": 0, >>>> "type": "int", >>>> "name": "score" >>>> } >>>> ] >>>> } >>>> [Finished in 0.1s with exit code 1] >>>> >>>> I am using avro v1.8.0 and python 2.7. What am I doing wrong here? >>>> Thanks. >>>> >>>> -- >>>> >>>> *Sarvagya Pant * >>>> *Kathmandu, Nepal* >>>> >>>> >>>> >>> >>> >>> -- >>> >>> *Sarvagya Pant * >>> *Kathmandu, Nepal* >>> >> > > -- > BENOCS GMBH > Arne Vogel > Winterfeldtstr. 21 > 10781 Berlin > Email: [email protected] > > Board of Management: Michael Wolz, Dr.-Ing. Oliver Holschke, Dr.-Ing. Ingmar > Poese > Commercial Register: Amtsgericht Bonn HRB 19378 > > -- *Sarvagya Pant* *Kathmandu, Nepal*
