Eevans added a comment. In https://phabricator.wikimedia.org/T114443#1731284, @GWicke wrote:
> In https://phabricator.wikimedia.org/T114443#1730753, @Eevans wrote: > > > 1. Already leverages a (really slick) JSON schema registry > > <https://meta.wikimedia.org/wiki/Category:Schemas_%28active%29?status=active> > > > Optionally fetching schemas from a URL isn't that hard really. Example code: > > if (/^https?:\/\//.test(schema)) { > return preq.get(schema); > } else { > return readFromFile(schema); > } > > > This lets us support files for core events, and fetching schemas from meta > for EL. Schema validation is a call to a library. The main reason that I listed this as a benefit, is because I don't understand why we need to distinguish between classes of events in this way (at the architectural level). Since EL already has an answer for schema registry, it seemed like an advantage. However, if we assume that we need an additional class of in-tree schemas, then the inverse is also true; It would be just as trivial to implement reading from the filesystem. > > 1. Provides a pluggable, composable, architecture with support for a wide > > range of readers/writers > > > How would this be an advantage for the EventBus portion? Many third-party > users will actually only want a minimal event bus, and EL doesn't seem to > help with this from what I have seen. For starters, it means that we have alternatives for environments where Kafka is overkill (small third-party installations, dev environments, mw-vagrant, etc). Using, for example, sqlite instead of Kafka is already something supported. There is also a tremendous amount of flexibility here, and even if we assume that we need none of that now, it's impossible to assume we never will. Having the ability to compose arbitrary event stream topologies, from/to a wide variety of sources/sinks, multiplex, and add in-line processing, sounds like a great set of capabilities to base such a project on. > > - schema registry availability > > > There are more concerns here than just availability (although that's > important, too). > > Third party users won't necessarily want to give their service access to the > internet in order to fetch schemas. We need to provide a way to retrieve a > full set of core schemas, and a git repository is an easy way to achieve this. Third parties could use our schema registry, or use the same extension we do, to host one of their own. Or, (as mentioned elsewhere), we could export snapshots of the relevant schemas via CI to ship along side the code (this seems safe, as a revision is immutable). > We also need proper code review and versioning for core schemas, and wikis > don't really support code review. We could consider storing pointers to > schemas (URLs) instead of the actual schemas in git, but this adds complexity > without much apparent benefit: I would say that both versioning and review are well covered here. I get your point that it's not as specialized as code review tooling might be, but wikis are an established means for collaboration. > Workflow with schemas in git: > > 1. create a patch with a schema change > 2. code review > > Workflow with pointers to schemas (URLs) in git: > 3. save a new schema on meta; note revision id > 4. create a patch with a schema URL change > 5. code review That doesn't seem too onerous to me. > > For performance, it needs to be Good Enough(tm), where Good Enough should > > be something we can quantify based on factors like latency, throughput, and > > capacity costs that aren't prohibitively expensive when weighed against > > other factors (e.g. engineering effort). > > > See https://phabricator.wikimedia.org/T88459#1604768. tl;dr: It's not > necessarily clear that saving very little code (see above) for EL schema > fetching outweights the cost of additional hardware. I always find these things difficult to quantify. There are so many variables. If hypothetically speaking, it only saved us a week, what is that worth? What could we do with another week (lost opportunity costs)? Also, how do you quantify the value of using a piece of software that other teams are already using? Where you have a wider set of active developers, and more eyes on it? Where ops is already familiar with it? I don't pretend to know the answers to these. TASK DETAIL https://phabricator.wikimedia.org/T114443 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ottomata, Eevans Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, GWicke, mobrovac, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, Manybubbles, daniel, JanZerebecki, RobLa-WMF, Jay8g, fgiunchedi, Dzahn, jeremyb, Legoktm, chasemp, Krenair _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
