Eevans added a comment.

In https://phabricator.wikimedia.org/T114443#1731284, @GWicke wrote:

> In https://phabricator.wikimedia.org/T114443#1730753, @Eevans wrote:
>
> > 1. Already leverages a (really slick) JSON schema registry 
> > <https://meta.wikimedia.org/wiki/Category:Schemas_%28active%29?status=active>
>
>
> Optionally fetching schemas from a URL isn't that hard really. Example code:
>
>   if (/^https?:\/\//.test(schema)) {
>     return preq.get(schema);
>   } else {
>     return readFromFile(schema);
>   }
>
>
> This lets us support files for core events, and fetching schemas from meta 
> for EL. Schema validation is a call to a library.


The main reason that I listed this as a benefit, is because I don't understand 
why we need to distinguish between classes of events in this way (at the 
architectural level).  Since EL already has an answer for schema registry, it 
seemed like an advantage.

However, if we assume that we need an additional class of in-tree schemas, then 
the inverse is also true; It would be just as trivial to implement reading from 
the filesystem.

> > 1. Provides a pluggable, composable, architecture with support for a wide 
> > range of readers/writers

> 

> 

> How would this be an advantage for the EventBus portion? Many third-party 
> users will actually only want a minimal event bus, and EL doesn't seem to 
> help with this from what I have seen.


For starters, it means that we have alternatives for environments where Kafka 
is overkill (small third-party installations, dev environments, mw-vagrant, 
etc).  Using, for example, sqlite instead of Kafka is already something 
supported.

There is also a tremendous amount of flexibility here, and even if we assume 
that we need none of that now, it's impossible to assume we never will.  Having 
the ability to compose arbitrary event stream topologies, from/to a wide 
variety of sources/sinks, multiplex, and add in-line processing, sounds like a 
great set of capabilities to base such a project on.

> > - schema registry availability

> 

> 

> There are more concerns here than just availability (although that's 
> important, too).

> 

> Third party users won't necessarily want to give their service access to the 
> internet in order to fetch schemas. We need to provide a way to retrieve a 
> full set of core schemas, and a git repository is an easy way to achieve this.


Third parties could use our schema registry, or use the same extension we do, 
to host one of their own.  Or, (as mentioned elsewhere), we could export 
snapshots of the relevant schemas via CI to ship along side the code (this 
seems safe, as a revision is immutable).

> We also need proper code review and versioning for core schemas, and wikis 
> don't really support code review. We could consider storing pointers to 
> schemas (URLs) instead of the actual schemas in git, but this adds complexity 
> without much apparent benefit:


I would say that both versioning and review are well covered here.  I get your 
point that it's not as specialized as code review tooling might be, but wikis 
are an established means for collaboration.

> Workflow with schemas in git:

> 

> 1. create a patch with a schema change

> 2. code review

> 

>   Workflow with pointers to schemas (URLs) in git:

> 3. save a new schema on meta; note revision id

> 4. create a patch with a schema URL change

> 5. code review


That doesn't seem too onerous to me.

> > For performance, it needs to be Good Enough(tm), where Good Enough should 
> > be something we can quantify based on factors like latency, throughput, and 
> > capacity costs that aren't prohibitively expensive when weighed against 
> > other factors (e.g. engineering effort).

> 

> 

> See https://phabricator.wikimedia.org/T88459#1604768. tl;dr: It's not 
> necessarily clear that saving very little code (see above) for EL schema 
> fetching outweights the cost of additional hardware.


I always find these things difficult to quantify.  There are so many variables. 
 If hypothetically speaking, it only saved us a week, what is that worth?  What 
could we do with another week (lost opportunity costs)?

Also, how do you quantify the value of using a piece of software that other 
teams are already using?  Where you have a wider set of active developers, and 
more eyes on it?  Where ops is already familiar with it?

I don't pretend to know the answers to these.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata, Eevans
Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, 
Nuria, ori, faidon, aaron, GWicke, mobrovac, Eevans, Ottomata, Matanya, 
Aklapper, JAllemandou, jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, 
RobH, aude, Deskana, Manybubbles, daniel, JanZerebecki, RobLa-WMF, Jay8g, 
fgiunchedi, Dzahn, jeremyb, Legoktm, chasemp, Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to