[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
gerritbot added a comment. Change 201238 merged by ArielGlenn: Adopt dumpwikidatajson.sh to the new naming pattern https://gerrit.wikimedia.org/r/201238 TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, gerritbot Cc: Liuxinyu970226, gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
gerritbot added a comment. Change 201238 had a related patch set uploaded (by Hoo man): Adopt dumpwikidatajson.sh to the new naming pattern https://gerrit.wikimedia.org/r/201238 TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, gerritbot Cc: gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
gerritbot added a comment. Change 201208 merged by ArielGlenn: Add new wikidata folders, define dataset folders in puppet https://gerrit.wikimedia.org/r/201208 TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, gerritbot Cc: gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
gerritbot added a subscriber: gerritbot. gerritbot added a comment. Change 201208 had a related patch set uploaded (by Hoo man): Add new wikidata folders, define dataset folders in puppet https://gerrit.wikimedia.org/r/201208 TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, gerritbot Cc: gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
daniel added a comment. @Smalyshev: You are right that it doesn't have to be based on JSON, but since that is our primary data representation, it seems sensible to use it as a basis. I agree that it doesn't matter much to have the RDF dumps consistent with the JSON dumps. But if we make multiple RDF dumps, it's important that they are consistent with each other. The easiest way to achieve this is to base them on the same JSON dump. Whether that should block this task is debatable of course. Perhaps it shouldn't. The idea was that putting a timestamp in the directory name only makes sense if we have consistent dumps. But we can live with inconsistencies for a while - it's not like the regular XML dumps were consistent either. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, daniel Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
Smalyshev added a comment. We could generate multiple dump files from the same database, it doesn't have to be from JSON. I'm also not sure why JSON and RDF should always have the same snapshot - it's a random point (or, given that dump takes many hours during which data changes, random collection of points) in time, no better than any other one. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, Smalyshev Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @Smalyshev Yes, this is what I was saying. @hoo was proposing to create a special directory for "truthy" based on offline discussion in the office. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
Smalyshev added a comment. I don't think splitting full and truthy would be too useful, as most query engines, except for the absolutely most basic ones, will want both anyway. And for JSON we don't even have that distinction I think? TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, Smalyshev Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @Smalyshev Re "what does consistent mean": to be based on the same input data. All dumps are based on Wikidata content. If they are based on the same content, they are consistent, otherwise they are not. Re "discussing RDF dump partitioning in https://phabricator.wikimedia.org/T93488": Agreed. We are not discussing which RDF dumps to have here, only whether they are likely to be well organised by distinguishing "full" and "truthy" as a primary categorisation that sits above format (RDF vs. JSON and other matters). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @JanZerebecki: Re using the same code: That's not essential here. All we want is that the dumps are the same. It's also not necessary to develop the code twice, since it is already there twice anyway. It's just the question if we want to use a slow method that keeps people waiting for the dumps for days (as they already do now with many other dumps), or a fast one that you can run anywhere (even without DB access; on a laptop if you like). The fact that we must have the code in PHP too makes it possible to go back to the slow system if it should ever be needed, so there is no lock-in. Dump file generation is also not operation-critical for Wikidata (the internal SPARQL query will likely be based on a live feed, not on dumps). What's not to like? Re consistency: I meant that the dumps would contain the same information, not that they reflect a consistent state of the site. If it is important for you to have a defined state, then the dump-based file generation is also your friend: one can do the same with the full history dump, where one could exactly specify the revision to dump. Probably still as fast as the DB method, but guaranteed to provide a globally consistent snapshot (yes, I know, modulo deletions). Not sure if this type of consistency is relevant though. Having a guarantee that the dump files in various formats are based on the same data, however, would be quite useful (e.g., in SPARQL, where you often mix data from truthy and full dumps in one query). Recall that we are discussing this here since Lydia said that the slowness of the DB-based exports would be a reason for why we cannot have an (otherwise convenient) date-based directory structure. I agree with Lydia that this would be a blocker, but in this case it's really one that we can easily remove. The code I am talking about is at https://github.com/Wikidata/Wikidata-Toolkit, well tested, extensively documented, and partially WMF-funded. Why not make this into a community engagement success story? :-) TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
Smalyshev added a comment. "Consistency" of dumps in different formats is a questionable thing. What would it mean to have JSON and RDF "consistent"? Of course they'd contain same entities, that's a given, and the data would be kind of alike. But even values may differ - i.e. RDF has no standard for representing coordinates, so we have to choose something. That something will not be the same as JSON. Also, if we want to represent dates in standard way - e.g. xsd:dateTime - we'd have to modify them, slightly or substantially. Same goes for many other things which look slightly different - ranks, units, truthy statements, etc. Ultimately, we're basing on the same data set, so excepting bugs we'd have consistency on that level, but beyond that I'm not sure what it is. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, Smalyshev Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
JanZerebecki added a comment. > Why would one want to do this? To be able to use the same code as is used for the linked data endpoint of Wikibase. Example: https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf?flavor=full (this format is not final and not yet to be relied on). > would guarantee consistent state of all files It would guarantee that all dump files are inconsistent in the same way. It would not achieve the consistency of the JSON dump. Not sure if anyone has a use for the previous but not the later. Anyway making the JSON dumps consistent allows both independent of how the other dumps are generated. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, JanZerebecki Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. > All of these dumps will be generated by exporting from the DB. Why would one want to do this? The JSON dump contains all information we need for building the other dumps, and it seems that the generation from the JSON dump is much faster, avoids any load on the DB, and would guarantee consistent state of all files (same revision status). Moreover, we already have code for doing it now (which will be updated to agree with any changes in RDF export structures we want). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
JanZerebecki added a comment. All of these dumps will be generated by exporting from the DB. AFAIK currently the dumps can contain edits that were made after the dump is started. We should at some point change this, but we should not block adding RDF for that. The result is that currently each dump format might represent slightly different data. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, JanZerebecki Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @Lydia_Pintscher I understand this problem, but if you put different dumps for different times all in one directory, won't this become quite big over time and hard to use? Maybe one should group dumps by how often they are created (and have date-directories only below that). For some cases, there does not seem to be any problem. For example, creating all RDF dumps from the JSON dump takes about 3-6h in total (on labs). So this is easily doable on the same day as the JSON dump generation. I am sure that we could also generate alternative JSON dumps in comparable time (maybe add an hour to the RDF if you do it in one batch). The slow part seems to be the DB export that leads to the first JSON dump -- once you have this the other formats should be relatively quick to do. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
Lydia_Pintscher added a comment. About 2: We didn't add timestamped subdirectories because they would likely be confusing. Dumps of different formats or flavors would not be done on the same date. And dump creation usually takes more than a day. So finding the right subfolder that has the format and flavor you are looking for seems bad. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, Lydia_Pintscher Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @hoo Thanks for the heads up! I do have comments. (1) I would remove the "full" and "truthy" distinction from the path and rather make this part of the dump type (for example "statements" and "truthy-statements"). The reason is that we have many full dumps (terms, sitelinks, statements, properties), which can be readily exported in RDF and JSON, but we have only one "truthy" dump and it really is mainly for RDF (at least we did not discuss a JSON format for "single-triple statements"). Therefore, it does not seem worth to make a top-level distinction in the directory structure for this. For consumers, it is easier if a dump file is addressed with four components (projectname, dumptype, date, file format). The truthy/full distinction would be another parameter that does not seem to add any functionality. (2) My comment right at the beginning of this bug report was to have timestamped subdirectories, just like we have for the main dumps. Maybe you have reasons for not having these, but could you explain them here? TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
hoo added a comment. @mkroetzsch: What's you opinion on the above naming scheme? Is it ok for you? If so, I will implement it soon. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, hoo Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
hoo added a comment. Ok, we talked about this in the office and came up with the following: `https://dumps.wikimedia.org/wikidatawiki/entities/` is the (user visible) base path (the actual files would be in `/other/…`), //could// also have a fancy html overview page with additional explanations. In there we have the subdirectories `full` and `truthy` (and possibly more later on). Those contain all dumps of those flavors, no matter the format. In those we have files like `(all|items|properties)-20150324(-BETA).(json|ttl|…)`. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, hoo Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
hoo added a comment. In https://phabricator.wikimedia.org/T72385#1142484, @Lydia_Pintscher wrote: > Why should Wikibase be in the name? Because just having "json" dumps could mean anything IMO... also I think having wikibase in there is more future proof after we hit commons. But that's just my opinion and it's not a particularly strong one. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, hoo Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
Lydia_Pintscher added a comment. Why should Wikibase be in the name? TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, Lydia_Pintscher Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
hoo added a comment. Ok, I talked about this with @ArielGlenn and we decided that the following would be doable and nice: Store the dumps (on the file system) in `https://dumps.wikimedia.org/other/wikibase-dumps/wikidatawiki/json`. Then we can have a symlink to that from `https://dumps.wikimedia.org/wikidatawiki/wikibase-json` and for backwards compatibility reasons from the current location (`https://dumps.wikimedia.org/other/wikidata/`). On top of that we can, if we want, have (via symlinks) `https://dumps.wikimedia.org/wikibase-dumps/wikidatawiki/json` (although I think this is overkill for now). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, hoo Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
hoo added a comment. Just for the record, I personally prefer `https://dumps.wikimedia.org/wikidatawiki/wikibase-json` over `https://dumps.wikimedia.org/wikidatawiki/json` as I think that makes it clear that those are Wikibase dumps. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, hoo Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
Lydia_Pintscher added a comment. We will publish more dumps than the current json dumps, yes. Daniel wants expanded json dumps for example that include full uri for external identifiers for example. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, Lydia_Pintscher Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, jeremyb-phone, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. I think json should be in the path somewhere. It does not have to be at the top-level, but it would not be good if dump files of one type end up in their own directory. The only way for tools to detect and download dumps automatically is to look at the HTML directory listings, and this listing should not change its appearance (again). Note that different types of dumps will be created in different intervals, so a combined directory that contains several types of dumps would look quite messy in the end. We could have wikibase-dumps/wikidatawiki/json if you prefer this over something like other/wikibase-json/wikidatawiki. However, the latter seems to be more consistent with /other/incr/wikidatawiki. I don't care much about the details, but it would be good to have something systematic in the end: either other/projectname/dumptype or other/dumptype/projectname seems most logical. Also, I think that "dumptype" could already mention wikibase if desired, so that there is no need for an extra directory "wikibase-dumps" on the path. The thing to avoid is to introduce a new directory structure for every new kind of dump (and "wikibase-dumps" smells a lot like this, even if there is a faint possibility that there will be more dumps of this kind in the future -- do you actually have any plans to move our RDF dumps from http://tools.wmflabs.org/wikidata-exports/rdf/ to the dumps site? Could be done, but not sure if it is needed.). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, jeremyb-phone, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
ArielGlenn added a comment. I can certainly make the old dir a symlink. wikibase-dumps/wikidatawiki is fine too. Markus? TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, jeremyb-phone, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
hoo added a comment. I'm not to fond of having "json" in the path, as we'll provide non-JSON Wikibase specific dumps at some point (rdf, maybe more) and those should IMO be at the same place. If we can't integrate this with the usual dump process now, can we have something like /other/wikibase-dumps/wikidatawiki which makes it clear, that those are the dumps we'll provide for Wikimedia's Wikibase repo installations (that would later on exist for commons as well... and maybe also for testwikidata)? Also we should probably make the old folder a redirect if we decided to change this, just fixing all links wont work. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, hoo Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, jeremyb-phone, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
ArielGlenn added a comment. I'm fine with json/wikidatadumps. WIkidata folks please sign off or suggest something you like better. This wil entail: fix to the cron job, move of the existing dumps, correcting any links that already exist (where are those?) TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, jeremyb-phone, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs