> @Andy Adding these constraints at the schema level prevents bad data from
making it onto Kafka topics in the first place, preventing data pollution.
I don't know what you mean by "making it harder to write data using that
schema"--imposing and enforcing constraints is kind of the point.

As far as I can tell, there is a fundamental trade-off between the
evolvability of a schema and the constraints you put on the writer. You
cannot evolve from a less constrained schema to a more constrained schema
without breaking backwards compatibility. Therefore if you value backwards
compatibility, you might find it preferable to just deal with the "bad"
data in the consumer by ignoring it, (perhaps in conjunction with counting
it in your monitoring system).

> Actually, regarding confluent schema registry, I'm not sure I get the
point : props are a valid part of a schema and are stored in the schema
registry just  fine.

The props themselves are stored in the registry but not the corresponding
code required to interpret them. So perhaps your v1 implementation of
"notEmpty" is incompatible with the v2 implementation of "notEmpty". Maybe
this particular prop is not likely to change but if your props are not
immutable, I'd imagine you'd run into this problem fairly quickly.



On Fri, May 12, 2017 at 9:00 AM, Joseph Pachod <
[email protected]> wrote:

> Actually, regarding confluent schema registry, I'm not sure I get the
> point : props are a valid part of a schema and are stored in the schema
> registry just  fine.
>
> So matter is more whether you're sure to be always be in the way of the
> avro binary generation. Personally I'm always cautious regarding third
> parties, so Avro is behind some wrapper and thus we are sure of being in
> the way.
>
> 2017-05-12 8:24 GMT+02:00 Tianxiang Xiong <tianxiang.xiong@
> fundingcircle.com>:
>
>> @Andy Adding these constraints at the schema level prevents bad data from
>> making it onto Kafka topics in the first place, preventing data pollution.
>> I don't know what you mean by "making it harder to write data using that
>> schema"--imposing and enforcing constraints is kind of the point.
>>
>> > Why not just handle the empty case where you consume the data?
>>
>> That's what we currently do, but we wouldn't have to have this extra test
>> case if we could impose the aforementioned constraint at the schema level.
>>
>> Right now, we treat messages with an empty array as erroneous, and output
>> a corresponding message onto an error topic. If we reset our application
>> and consumed messages again, we'd be putting new messages onto the error
>> topic, *doubling* the unwanted data.
>>
>> @Joseph That's an interesting approach. I know that Avro is extensible,
>> but we're relying on some third-party serde classes, and as @Andy mentions,
>> once you start getting into the weeds all bets are off.
>>
>> On 11 May 2017 at 09:48, Andy Chambers <[email protected]> wrote:
>>
>>> I think the question you need to ask/answer is what is there to gain by
>>> adding this constraint. (This goes for any writer constraint)
>>>
>>> Each constraint you add makes it harder to write data using that schema.
>>>
>>> Why not just handle the empty case where you consume the data?
>>>
>>> Once you start adding custom datum writers, all bets are off with
>>> respect to schema compatibility so if you're using/trusting something like
>>> the confluent schema registry you're in trouble.
>>>
>>> On 11 May 2017 4:35 pm, "Joseph P." <[email protected]> wrote:
>>>
>>> Hi
>>>
>>> You can add prop to your avro schema.
>>>
>>> So here we have added our custo props and extra processing before
>>> generating the avro binary to make sure these props are respected.
>>>
>>> Pro : very flexible (we have added max_length on string, temporal_format
>>> and so forth...).
>>> Cons : you must be sure to have your extra processing running before
>>> generating the avro binaries
>>>
>>> For example in your case you could add a prop "nonEmpty" with default
>>> value to false.
>>>
>>> Then, before converting the Avro Json/Pojo to Avro binary, you use your
>>> own SpecificDatumWriter (extending SpecificDatumWriter) and then in
>>> writeField you check for the presence of the prop, its value, and if true
>>> you check for non emptiness.
>>>
>>> Cheers
>>>
>>>
>>> On Wed, May 10, 2017 at 10:41 AM, Tianxiang Xiong <
>>> [email protected]> wrote:
>>>
>>>> Thanks Suraj, but that's not what I mean.
>>>>
>>>> For your second schema, it is possible to pass in an empty array `[]`
>>>> containing no elements. I would like to prevent that.
>>>>
>>>> On 8 May 2017 at 19:32, Suraj Acharya <[email protected]> wrote:
>>>>
>>>>> This is what I have done in my application :
>>>>>
>>>>> {"name": "clients", "type": [ {"type": "array", "items": "Client"}, 
>>>>> "null" ]}
>>>>>
>>>>> This allows me to pass null. What you can try is something like this :
>>>>>
>>>>> {"name": "info", "type": { "type": "array", "items": "Information" }
>>>>>
>>>>> In this example, info is something that needs to be passed for every
>>>>> client.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>>
>>>>> On Fri, May 5, 2017 at 9:51 PM, Tianxiang Xiong <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> In Avro 1.7.7, is there a way to specify a *non-empty* array, map,
>>>>>> etc.? There doesn't seem to be according to the spec
>>>>>> <https://avro.apache.org/docs/1.7.7/spec.html#Maps>.
>>>>>>
>>>>>> There are applications in which we mandate that a data format has a
>>>>>> non-empty array. It'd be nice if that could be expressed in the schema so
>>>>>> data with nonempty arrays fail to serialize (and are thus never put on a
>>>>>> Kafka topic). Fail earlier > fail later.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> TX
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tianxiang Xiong*
>>>>
>>>> *[email protected] <[email protected]>*
>>>>
>>>> 747 Front Street, Floor 4 | San Francisco, CA 94111
>>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Tianxiang Xiong*
>>
>> *[email protected] <[email protected]>*
>>
>> 747 Front Street, Floor 4 | San Francisco, CA 94111
>>
>
>
>
> --
>
> [image: Image1]
>
> Joseph PACHOD
> Architecte logiciel
>
> *[email protected] <[email protected]>*
>
> [image: Image002]  0811 696 386
>
> *www.berger-levrault.com* <http://www.berger-levrault.com/>
>
> [image: boutique1] <http://boutique.berger-levrault.fr/>  [image:
> youtube1] <https://www.youtube.com/channel/UCpBKKOUeuDAQhSpLTqBMaSA>  [image:
> twitter1] <https://twitter.com/bergerlevrault>  [image: linkedin1]
> <https://fr.linkedin.com/company/berger-levrault>
>

Reply via email to