Joe added a comment.

In https://phabricator.wikimedia.org/T114443#1705509, @Ottomata wrote:

> @Joe,  there are two parts to this MVP:
>
> - Centralized (and CI controlled) schema sharing
> - An easy way to get valid data into Kafka.
>
>   With eventlogging right now, we are spending a lot of resources just 
> processing incoming data and making sure it is valid.  When data is produced 
> by N clients, it is difficult to guarantee that all of them are producing 
> valid data.  Invalid data in a stream can make processing and sanity checking 
> difficult, especially in a distributed environment like Hadoop.


So you say that spending the resources processing incoming data and making sure 
it is valid would vanish magically? or would they be moved somewhere else?

I don't really get this point.

> You are right though, we could achieve a similar thing if we built Kafka 
> wrappers that new how to use our centralized schema system in all languages.  
> Then each wrapper could validate the message a client is trying to send 
> before it actually sends.  I think this could potentially introduce more 
> bugs, as there is more actual code to maintain, as well as more places that 
> the code is deployed.  E.g. we'd have to make sure that all clients updated 
> their wrapper library if we fix a bug.


As I stressed before, assuming you won't need json schema validations in your 
apps is planning for horrible coding. Seriously, any app that trusts blindly 
what comes from a service without validating external input is... well, not 
acceptable (and yes, you are basically saying you plan to do exactly that...).

So you will still need all the validation routines in each language, what you 
would not need would be a kafka driver to move around those messages. On the 
other hand, you will need a client library for our new shiny REST service, 
whose only advantage over a protocol thought for routing messages would be that 
it's been invented here. Also, libraries for kafka already exist in most 
languages, so I don't really see what we are talking about.

I am probably missing something fundamental here since you're all so convinced 
you could just slap a rest service in front of kafka and then just skip 
validation of schema and content of messages on any software that is going to 
be using it. And that sounds plainly wrong to me.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Joe
Cc: EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, 
GWicke, mobrovac, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, jkroll, 
Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, Manybubbles, 
mark, JanZerebecki, RobLa-WMF, fgiunchedi, Dzahn, jeremyb, chasemp, Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to