Eevans added a comment.

In https://phabricator.wikimedia.org/T114443#1701296, @Joe wrote:

> Apart from the concerns on a practical use case which I agree with, I have a 
> big doubt about the implementation idea:
>
> I am in general a fan of the paradigm that it's better to beg for forgiveness 
> than to ask for permission, and of Postel's robustness principle, so I don't 
> really see what use a service in front of kafka would serve us, apart from 
> introducing another software that could fail and some latency.
>
> Messages we send onto kafka will be anyways verified on the receiving end 
> (considering them "trusted" would be foolish), so we will need to write 
> validation libraries in basically all the languages we will consume our data 
> from; this is the standard way to build communications protocols and I don't 
> see a good reason for introducing a level of indirection here.


Why is this, why would they //need// to be verified on the receiving end?

I see this as being somewhat analogous to a database.  In any database you 
//could// store your data opaquely, allow each client to marshal it according 
to some shared notion of schema, and then have every client validate (the 
untrustworthy input) on read, but how is that better?  If the data is 
structured according to a well defined schema, why not let the system 
persisting it apply those constraints on write?  Assuming the goal is to 
disseminate these events to an arbitrary number of independently implemented 
systems, it seems the latter approach would provided better guarantees about 
the integrity of the data, and eliminate a lot of redundancy among 
implementations.

> So, I have two questions I'd like an answer to:

> 

> - What is the advantage of having a service validate messages before they get 
> into the queue (Kafka or other doesn't really matter)


It assures a single consistent set of constraints on events, independent of the 
various producer/consumer implementations.

> - Why building libraries that do the validations based on shared schemas not 
> enough?


A service provides a single high level abstraction that hides the details of 
the underlying implementation (allowing said implementation to be transparently 
changed), eliminates redundancy among implementations, and prevents a single 
buggy consumer from propagating corrupt events to all consumers.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, 
GWicke, mobrovac, Halfak, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, 
jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, 
Manybubbles, mark, JanZerebecki, RobLa-WMF, fgiunchedi, Dzahn, jeremyb, 
chasemp, Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to