Dear Yibing Shi,

a default value for a union must have the schema which is the first union member. Therefore, to set e.g. an int default value, use ["int", "null"] instead of ["null", "int"].

For more details, see the spec:
http://avro.apache.org/docs/1.8.1/spec.html#schema_complex

Regards,
Arne Vogel

On 08.07.2016 14:51, Yibing Shi wrote:
+ Sean Busbey

My understanding is this problem is a limitation of Python AVRO library. Currently it seems that the only valid default value is "null". Please try below schema to see whether it works for you.

{
/    "type" : "record",/
/    "name" : "data",/
/    "namespace" : "my.example",/
/    "fields" : [/
/ {"name" : "domain", "type" : ["null", "string"], "default" : null},/
/        {"name" : "ip", "type" : ["null", "string"], "default" : null},/
/        {"name" : "port", "type" : ["null", "int"], "default" : null},/
/        {"name" : "score", "type" : ["null", "int"], "default" : null}/
/    ]/
/}/

Below JIRAs seems to be related:

https://issues.apache.org/jira/browse/AVRO-1265
https://issues.apache.org/jira/browse/AVRO-1566

I am pretty sure that the AVRO Java library supports using a non-null default value for record fields. You can try it in a Java program.


/*Yibing Shi*/
/*Customer Operations Engineer*/
<http://www.cloudera.com>

On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik <[email protected] <mailto:[email protected]>> wrote:

    I'm not familiar with Avro good enough to propose an "Avro
    solution" for your problem :(

    If you want to serialize default values into Avro for some fields
    you should provide the default values in code explicitly when
    writing to Avro. Another approach is to declare the fields as
    nullable using union types (e.g. [null, int]) and use default
    values in code explicitly when reading from Avro.

    I believe the "default" key you used in Avro schema is meant for
    schema evolution
    http://avro.apache.org/docs/current/spec.html#Schema+Resolution

      * if the reader's record schema has a field that contains a
        default value, and writer's schema does not have a field with
        the same name, then the reader should use the default value
        from its field.


    пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[email protected]
    <mailto:[email protected]>>:

        Hi Stanislav,

        Thanks for the reply. What I want to achieve is that data
        arriving in Avro writer may not contain all field as specified
        in the example above. I would like to save default value if
        possible or retrieve the default value when using
        DataFileReader. Is this possible? Should the data always
        contain all the keys specified in the schema. I tried using
        ["int", "null"], "default" : 0, but this was able to save the
        data if any field is not present, but using DataFileReader I
        got None instead of default value 0. Any help will be much
        appreciated. Thanks.

        On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik
        <[email protected] <mailto:[email protected]>> wrote:

            Hi,

            I believe default values only work for readers, not writers.

            Spec says that
            (http://avro.apache.org/docs/current/spec.html):
            > default: A default value for this field, used when
            reading instances that lack this field (optional).

            On 7 июля 2016 г., at 21:16, Sarvagya Pant
            <[email protected]
            <mailto:[email protected]>> wrote:

            I am trying to implement Avro to replace some codes that
            tries to write data in CSV. This is because CSV cannot
            store the type of the field and all data are treated as
            string when trying to consume. I have copied the code for
            Avro from its website and would like to set a default
            value if there is no field.

            My avro file looks like this:

            {
                "type" : "record",
                "name" : "data",
                "namespace" : "my.example",
                "fields" : [
                    {"name" : "domain", "type" : "string", "default"
            : "EMPTY"},
                    {"name" : "ip", "type" : "string", "default" :
            "EMPTY"},
                    {"name" : "port", "type" : "int", "default" : 0},
                    {"name" : "score", "type" : "int", "default" : 0}
                ]
            }

            I have written a simple python file that is expected to
            work. It is given below:

            import avro.schema
            from avro.datafile import DataFileReader, DataFileWriter
            from avro.io <http://avro.io/> import DatumReader,
            DatumWriter

            schema = avro.schema.parse(open("data.avsc", "rb").read())

            writer = DataFileWriter(open("users.avro", "w"),
            DatumWriter(), schema)
            writer.append({"domain": "hello domain", "score" : 20,
            "port" : 8080})
            writer.append({"ip": "1.2.3.4", "port" : 80})
            writer.append({"domain": "another domain", "score" : 100})
            writer.close()

            reader = DataFileReader(open("users.avro", "rb"),
            DatumReader())
            for data in reader:
                print data
            reader.close()

            However, if I try to run this program, I get error that
            data are not mapped according to schema.

                Traceback (most recent call last):
              File "D:\arko.py", line 8, in <module>
            writer.append({"domain": "hello domain", "score" : 20,
            "port" : 8080})
              File "build\bdist.win32\egg\avro\datafile.py", line
            196, in append
              File "build\bdist.win32\egg\avro\io.py", line 769, in write

            avro.io.AvroTypeException: The datum {'domain': 'hello
            domain', 'score': 20, 'port': 8080} is not an example of
            the schema {
              "namespace": "my.example",
              "type": "record",
              "name": "userInfo",
              "fields": [
                {
                  "default": "EMPTY",
                  "type": "string",
                  "name": "domain"
                },
                {
                  "default": "EMPTY",
                  "type": "string",
                  "name": "ip"
                },
                {
                  "default": 0,
                  "type": "int",
                  "name": "port"
                },
                {
                  "default": 0,
                  "type": "int",
                  "name": "score"
                }
              ]
            }
            [Finished in 0.1s with exit code 1]

            I am using avro v1.8.0 and python 2.7. What am I doing
            wrong here? Thanks.

-- *Sarvagya Pant
            *
            *Kathmandu, Nepal*




-- *Sarvagya Pant
        *
        *Kathmandu, Nepal*



--
BENOCS GMBH
Arne Vogel
Winterfeldtstr. 21
10781 Berlin
Email: [email protected]
www.benocs.com

Board of Management: Michael Wolz, Dr.-Ing. Oliver Holschke, Dr.-Ing. Ingmar 
Poese
Commercial Register: Amtsgericht Bonn HRB 19378

Reply via email to