+ Sean Busbey
My understanding is this problem is a limitation of Python AVRO
library. Currently it seems that the only valid default value is
"null". Please try below schema to see whether it works for you.
{
/ "type" : "record",/
/ "name" : "data",/
/ "namespace" : "my.example",/
/ "fields" : [/
/ {"name" : "domain", "type" : ["null", "string"], "default" :
null},/
/ {"name" : "ip", "type" : ["null", "string"], "default" : null},/
/ {"name" : "port", "type" : ["null", "int"], "default" : null},/
/ {"name" : "score", "type" : ["null", "int"], "default" : null}/
/ ]/
/}/
Below JIRAs seems to be related:
https://issues.apache.org/jira/browse/AVRO-1265
https://issues.apache.org/jira/browse/AVRO-1566
I am pretty sure that the AVRO Java library supports using a non-null
default value for record fields. You can try it in a Java program.
/*Yibing Shi*/
/*Customer Operations Engineer*/
<http://www.cloudera.com>
On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik
<[email protected] <mailto:[email protected]>> wrote:
I'm not familiar with Avro good enough to propose an "Avro
solution" for your problem :(
If you want to serialize default values into Avro for some fields
you should provide the default values in code explicitly when
writing to Avro. Another approach is to declare the fields as
nullable using union types (e.g. [null, int]) and use default
values in code explicitly when reading from Avro.
I believe the "default" key you used in Avro schema is meant for
schema evolution
http://avro.apache.org/docs/current/spec.html#Schema+Resolution
* if the reader's record schema has a field that contains a
default value, and writer's schema does not have a field with
the same name, then the reader should use the default value
from its field.
пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[email protected]
<mailto:[email protected]>>:
Hi Stanislav,
Thanks for the reply. What I want to achieve is that data
arriving in Avro writer may not contain all field as specified
in the example above. I would like to save default value if
possible or retrieve the default value when using
DataFileReader. Is this possible? Should the data always
contain all the keys specified in the schema. I tried using
["int", "null"], "default" : 0, but this was able to save the
data if any field is not present, but using DataFileReader I
got None instead of default value 0. Any help will be much
appreciated. Thanks.
On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik
<[email protected] <mailto:[email protected]>> wrote:
Hi,
I believe default values only work for readers, not writers.
Spec says that
(http://avro.apache.org/docs/current/spec.html):
> default: A default value for this field, used when
reading instances that lack this field (optional).
On 7 июля 2016 г., at 21:16, Sarvagya Pant
<[email protected]
<mailto:[email protected]>> wrote:
I am trying to implement Avro to replace some codes that
tries to write data in CSV. This is because CSV cannot
store the type of the field and all data are treated as
string when trying to consume. I have copied the code for
Avro from its website and would like to set a default
value if there is no field.
My avro file looks like this:
{
"type" : "record",
"name" : "data",
"namespace" : "my.example",
"fields" : [
{"name" : "domain", "type" : "string", "default"
: "EMPTY"},
{"name" : "ip", "type" : "string", "default" :
"EMPTY"},
{"name" : "port", "type" : "int", "default" : 0},
{"name" : "score", "type" : "int", "default" : 0}
]
}
I have written a simple python file that is expected to
work. It is given below:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io <http://avro.io/> import DatumReader,
DatumWriter
schema = avro.schema.parse(open("data.avsc", "rb").read())
writer = DataFileWriter(open("users.avro", "w"),
DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20,
"port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()
reader = DataFileReader(open("users.avro", "rb"),
DatumReader())
for data in reader:
print data
reader.close()
However, if I try to run this program, I get error that
data are not mapped according to schema.
Traceback (most recent call last):
File "D:\arko.py", line 8, in <module>
writer.append({"domain": "hello domain", "score" : 20,
"port" : 8080})
File "build\bdist.win32\egg\avro\datafile.py", line
196, in append
File "build\bdist.win32\egg\avro\io.py", line 769, in write
avro.io.AvroTypeException: The datum {'domain': 'hello
domain', 'score': 20, 'port': 8080} is not an example of
the schema {
"namespace": "my.example",
"type": "record",
"name": "userInfo",
"fields": [
{
"default": "EMPTY",
"type": "string",
"name": "domain"
},
{
"default": "EMPTY",
"type": "string",
"name": "ip"
},
{
"default": 0,
"type": "int",
"name": "port"
},
{
"default": 0,
"type": "int",
"name": "score"
}
]
}
[Finished in 0.1s with exit code 1]
I am using avro v1.8.0 and python 2.7. What am I doing
wrong here? Thanks.
--
*Sarvagya Pant
*
*Kathmandu, Nepal*
--
*Sarvagya Pant
*
*Kathmandu, Nepal*