Joe added a comment.

In https://phabricator.wikimedia.org/T114443#1703097, @Eevans wrote:

> In https://phabricator.wikimedia.org/T114443#1701296, @Joe wrote:
>
> > Apart from the concerns on a practical use case which I agree with, I have 
> > a big doubt about the implementation idea:
> >
> > I am in general a fan of the paradigm that it's better to beg for 
> > forgiveness than to ask for permission, and of Postel's robustness 
> > principle, so I don't really see what use a service in front of kafka would 
> > serve us, apart from introducing another software that could fail and some 
> > latency.
> >
> > Messages we send onto kafka will be anyways verified on the receiving end 
> > (considering them "trusted" would be foolish), so we will need to write 
> > validation libraries in basically all the languages we will consume our 
> > data from; this is the standard way to build communications protocols and I 
> > don't see a good reason for introducing a level of indirection here.
>
>
> Why is this, why would they //need// to be verified on the receiving end?


yes, unless we prevent kafka from speaking to anything but our rest service. We 
can do it, of course, but we already have a counterexample I guess from what I 
read a few comments ago.

> I see this as being somewhat analogous to a database.  In any database you 
> //could// store your data opaquely, allow each client to marshal it according 
> to some shared notion of schema, and then have every client validate (the 
> untrustworthy input) on read, but how is that better?  If the data is 
> structured according to a well defined schema, why not let the system 
> persisting it apply those constraints on write?  Assuming the goal is to 
> disseminate these events to an arbitrary number of independently implemented 
> systems, it seems the latter approach would provided better guarantees about 
> the integrity of the data, and eliminate a lot of redundancy among 
> implementations.


A message queue is not a database, it's a router. What you want to validate is 
the content of the messages kafka is routing, I stand by the idea that doing 
that is importnat but must be done at the app level anyways.

> 

> 

> > So, I have two questions I'd like an answer to:

> 

> > 

> 

> > - What is the advantage of having a service validate messages before they 
> > get into the queue (Kafka or other doesn't really matter)

> 

> 

> It assures a single consistent set of constraints on events, independent of 
> the various producer/consumer implementations.


at the cost of reducing a complex and rich queue system to a REST paradigm, and 
introducing yet another layer of chain-calls that can fail independently. Did 
you guys already evaluated what would be lost in translation, if anything?

> > - Why building libraries that do the validations based on shared schemas 
> > not enough?

> 

> 

> A service provides a single high level abstraction that hides the details of 
> the underlying implementation (allowing said implementation to be 
> transparently changed), eliminates redundancy among implementations, and 
> prevents a single buggy consumer from propagating corrupt events to all 
> consumers.


Well, as I said above, if the implementations (we're talking about 2, max 3 of 
them... which you should still do against your rest service, vs the 1 you'd put 
in your service) need to verify the messages they're receiving, which is 
sensible, you don't duplicate effort, you just simplify an architecture.

I have worked before with systems that purposedly added a "sane indirection" in 
front of backend technologies, and it always turned out to be a worse idea than 
using libraries.  But well, this might just be the case in which that won't 
happen, I just don't see it.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Joe
Cc: EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, 
GWicke, mobrovac, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, jkroll, 
Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, Manybubbles, 
mark, JanZerebecki, RobLa-WMF, fgiunchedi, Dzahn, jeremyb, chasemp, Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to