Yurik added a comment.

I'm a little lost here. Is the idea that only data that can be structured as rows and columns ("fit" into a table) will be supported? Will nested key/value pairs be supported as the contents of an individual table cell?

@MZMcBride, while generic JSON content handler could support nested data structure, the whole idea behind tabular content handler is to provide a simple tabular format with each cell being a single value, with the exception of multi-lingual string values, which store key (lang code) => string objects. This should cover the vast majority of the usecases -- in-article tables, lists, data for graphs.

I'm not sure a .tab file extension is needed. I actually thought T120452: Allow structured datasets on a central repository (CSV, TSV, JSON, GeoJSON, XML, ...) was about storing XML, CSV, TSV, JSON, etc. in wiki pages, but looking at http://data.wmflabs.org/w/index.php?title=Data:Sample.tab&action=""> I'm a lot less sure now.

While we don't have to use .tab, it would help because we won't need to create a new namespace for each new data type, reusing Data namespace instead. Namespace prolifiration has been a constant complaint by many users. Also, I do not want to support multiple storage types, especially the notoriously bad CSV/TSV. Better provide an easy import/export functionality for them. Another type that has already been implemented and is undergoing some discussion is storing .geojson - map overlays. We could eventually introduce .json, but we have to be very clear what usecases it will solve.

The discussion about data types and constraints in this task makes me worry that we're slowly inventing yet another database engine when we already have options such as SQLite.

While it would be awesome to provide a large custom database support, the current proposal is limited to small tables, such as replacing the lists and tables we already have in many articles with a cross-wiki sharable, structured, Lua and graph accessible system. Which means it will not have any SQL-like functionality such as sorting/filtering via API, but rather allow Lua modules or Graph extension to read the table as a whole and process it as needed.

What are the storage considerations/implications for Wikimedia wikis here? Every time an edit is made to a table cell, we'd then be saving a full copy of the page? Will users download and manipulate up to 2 MB of text in a textarea, or even heavier, an enhanced textarea featuring syntax highlighting?

Currently, when users update a table in an article (even one cell), a full save is made. For now, this feature will follow the same model of multiple edits + one save action. Much further in the future we may introduce a more powerful backend (sqlite/...) that would handle per-cell edits, but clearly this would be by far more involved. I do not think there will be such a massive increase in data pagees as compared to the regular wiki articles, especially since data will be usable by multiple wikis.

There are very valid and important reasons that we have pagination, offsets, and limits with data sets. Will these three features be supported with wiki pages?

No. Small datasets <2mb only.

There's also a real concern that we'll be immediately setting ourselves up for medium-term future problems (e.g., storing more than 2 MB) as we scale up and expand this type of wiki page-based data storage implementation.

Sure, but larger data sets is a very different problem to have. So far, all lists and tables have been stored as wiki pages, for which 2MB was enough. Some day I hope we can support arbitrary external data, where users would set up comunity currated external URLs, and we would automatically create a data mirror and expose it to the world.



To: Yurik
Cc: Strainu, pwalsh, rufuspollock, RobLa-WMF, Danny_B, DannyH, StudiesWorld, Steinsplitter, Aklapper, Lydia_Pintscher, ekkis, Matanya, MarkTraceur, JEumerus, Thryduulf, Milimetric, MZMcBride, Bawolff, -jem-, gerritbot, Pokefan95, TerraCodes, intracer, ThurnerRupert, brion, Jdforrester-WMF, Eloy, TheDJ, Yurik, Zppix, V4switch, D3r1ck01, Izno, Luke081515, JAllemandou, Wikidata-bugs, matthiasmullie, aude, El_Grafo, jayvdb, Ricordisamoa, Shizhao, fbstj, Fabrice_Florin, Mbch331, Jay8g, Krenair, jeremyb
Wikidata-bugs mailing list

Reply via email to