I have tried both cases.
Case 1: using ["int", "null"], "default" : 0. In this case writing a sample
data {"ip" : "1.1.1.1", "port" : 20} in DataFileWriter yeilded {u'ip':
u'1.1.1.1', u'domain': None, u'score': None, u'port': 20} where I expected
score to be 0 instead of None.
Case 2: using ["null", "int"], "default": null : Same as above case was
seen.

Is this a limitation in Python library of avro? If so, can someone
recommend me any other python based library similar to avro. Thanks.

On Fri, Jul 8, 2016 at 7:02 PM, Arne Vogel <[email protected]> wrote:

> Dear Yibing Shi,
>
> a default value for a union must have the schema which is the first union
> member. Therefore, to set e.g. an int default value, use ["int", "null"]
> instead of ["null", "int"].
>
> For more details, see the spec:
> http://avro.apache.org/docs/1.8.1/spec.html#schema_complex
>
> Regards,
> Arne Vogel
>
>
> On 08.07.2016 14:51, Yibing Shi wrote:
>
> + Sean Busbey
>
> My understanding is this problem is a limitation of Python AVRO library.
> Currently it seems that the only valid default value is "null". Please try
> below schema to see whether it works for you.
>
> {
> *    "type" : "record",*
> *    "name" : "data",*
> *    "namespace" : "my.example",*
> *    "fields" : [*
> *        {"name" : "domain", "type" : ["null", "string"], "default" :
> null},*
> *        {"name" : "ip", "type" : ["null", "string"], "default" : null},*
> *        {"name" : "port", "type" : ["null", "int"], "default" : null},*
> *        {"name" : "score", "type" : ["null", "int"], "default" : null}*
> *    ]*
> *}*
>
> Below JIRAs seems to be related:
>
> https://issues.apache.org/jira/browse/AVRO-1265
> https://issues.apache.org/jira/browse/AVRO-1566
>
> I am pretty sure that the AVRO Java library supports using a non-null
> default value for record fields. You can try it in a Java program.
>
>
> *Yibing Shi*
> *Customer Operations Engineer*
> <http://www.cloudera.com>
>
> On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik <
> <[email protected]>[email protected]> wrote:
>
>> I'm not familiar with Avro good enough to propose an "Avro solution" for
>> your problem :(
>>
>> If you want to serialize default values into Avro for some fields you
>> should provide the default values in code explicitly when writing to Avro.
>> Another approach is to declare the fields as nullable using union types
>> (e.g. [null, int]) and use default values in code explicitly when reading
>> from Avro.
>>
>> I believe the "default" key you used in Avro schema is meant for schema
>> evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution
>>
>>
>>    - if the reader's record schema has a field that contains a default
>>    value, and writer's schema does not have a field with the same name, then
>>    the reader should use the default value from its field.
>>
>>
>> пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[email protected]>:
>>
>>> Hi Stanislav,
>>>
>>> Thanks for the reply. What I want to achieve is that data arriving in
>>> Avro writer may not contain all field as specified in the example above. I
>>> would like to save default value if possible or retrieve the default value
>>> when using DataFileReader. Is this possible? Should the data always contain
>>> all the keys specified in the schema. I tried using ["int", "null"],
>>> "default" : 0, but this was able to save the data if any field is not
>>> present, but using DataFileReader I got None instead of default value 0.
>>> Any help will be much appreciated. Thanks.
>>>
>>> On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I believe default values only work for readers, not writers.
>>>>
>>>> Spec says that ( <http://avro.apache.org/docs/current/spec.html>
>>>> http://avro.apache.org/docs/current/spec.html):
>>>> > default: A default value for this field, used when reading instances
>>>> that lack this field (optional).
>>>>
>>>> On 7 июля 2016 г., at 21:16, Sarvagya Pant < <[email protected]>
>>>> [email protected]> wrote:
>>>>
>>>> I am trying to implement Avro to replace some codes that tries to write
>>>> data in CSV. This is because CSV cannot store the type of the field and all
>>>> data are treated as string when trying to consume. I have copied the code
>>>> for Avro from its website and would like to set a default value if there is
>>>> no field.
>>>>
>>>> My avro file looks like this:
>>>>
>>>> {
>>>>     "type" : "record",
>>>>     "name" : "data",
>>>>     "namespace" : "my.example",
>>>>     "fields" : [
>>>>         {"name" : "domain", "type" : "string", "default" : "EMPTY"},
>>>>         {"name" : "ip", "type" : "string", "default" : "EMPTY"},
>>>>         {"name" : "port", "type" : "int", "default" : 0},
>>>>         {"name" : "score", "type" : "int", "default" : 0}
>>>>     ]
>>>> }
>>>>
>>>> I have written a simple python file that is expected to work. It is
>>>> given below:
>>>>
>>>> import avro.schema
>>>> from avro.datafile import DataFileReader, DataFileWriter
>>>> from avro.io import DatumReader, DatumWriter
>>>>
>>>> schema = avro.schema.parse(open("data.avsc", "rb").read())
>>>>
>>>> writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
>>>> writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
>>>> writer.append({"ip": "1.2.3.4", "port" : 80})
>>>> writer.append({"domain": "another domain", "score" : 100})
>>>> writer.close()
>>>>
>>>> reader = DataFileReader(open("users.avro", "rb"), DatumReader())
>>>> for data in reader:
>>>>     print data
>>>> reader.close()
>>>>
>>>> However, if I try to run this program, I get error that data are not
>>>> mapped according to schema.
>>>>
>>>>     Traceback (most recent call last):
>>>>   File "D:\arko.py", line 8, in <module>
>>>>     writer.append({"domain": "hello domain", "score" : 20, "port" :
>>>> 8080})
>>>>   File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
>>>>   File "build\bdist.win32\egg\avro\io.py", line 769, in write
>>>>
>>>> avro.io.AvroTypeException: The datum {'domain': 'hello domain',
>>>> 'score': 20, 'port': 8080} is not an example of the schema {
>>>>   "namespace": "my.example",
>>>>   "type": "record",
>>>>   "name": "userInfo",
>>>>   "fields": [
>>>>     {
>>>>       "default": "EMPTY",
>>>>       "type": "string",
>>>>       "name": "domain"
>>>>     },
>>>>     {
>>>>       "default": "EMPTY",
>>>>       "type": "string",
>>>>       "name": "ip"
>>>>     },
>>>>     {
>>>>       "default": 0,
>>>>       "type": "int",
>>>>       "name": "port"
>>>>     },
>>>>     {
>>>>       "default": 0,
>>>>       "type": "int",
>>>>       "name": "score"
>>>>     }
>>>>   ]
>>>> }
>>>> [Finished in 0.1s with exit code 1]
>>>>
>>>> I am using avro v1.8.0 and python 2.7. What am I doing wrong here?
>>>> Thanks.
>>>>
>>>> --
>>>>
>>>> *Sarvagya Pant *
>>>> *Kathmandu, Nepal*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Sarvagya Pant *
>>> *Kathmandu, Nepal*
>>>
>>
>
> --
> BENOCS GMBH
> Arne Vogel
> Winterfeldtstr. 21
> 10781 Berlin
> Email: [email protected]
>
> Board of Management: Michael Wolz, Dr.-Ing. Oliver Holschke, Dr.-Ing. Ingmar 
> Poese
> Commercial Register: Amtsgericht Bonn HRB 19378
>
>


-- 

*Sarvagya Pant*
*Kathmandu, Nepal*

Reply via email to