Interesting to see that there is now a plan for a phased import and the first 250 pages have been created.

https://meta.wikimedia.org/wiki/China_Judgments_Online_Preservation_Program#Stage_1:_Micro-pilot_(50-200_pages;_fully_reviewed)

I've only checked a couple pages and they look neat. Some of the layout templates work better than I would have expected. I've not tested the search functionality.

I've left a comment on the privacy complaint process on Meta.

Federico

Il 18/12/25 18:34, Federico Leva (Nemo) ha scritto:
Il 06/12/25 13:11, [email protected] ha scritto:
I would like to begin with some background, because many non-Chinese Wikimedia contributors may not be aware of how significant CJO has been for judicial transparency in China and how sharply access to it has been reduced in recent years.

Thanks for this context, it's super interesting!
For our purposes, the important point is this: CJO has removed or restricted access to large portions of its historical archive, including documents that were originally public, legally non- copyrightable under Chinese law, and crucial for understanding the functioning of China’s legal system. Many judgments that were once easily verifiable on the official site can no longer be checked against their original source. These documents are at risk of disappearing entirely from public access.

How strong is the presumption of copyright-ineligibility? What's the legal source for it and could it change in the future? (I'm clueless about the hierarchy of sources of law in China, sorry.)

Have other Wikisources hosted similarly massive, uniform corpora of government or legal documents? How did you determine whether they fit the mission of Wikisource? Were there concerns about overwhelming the project or changing its character?

Nothing as massive, but Italian Wikisource hosts court rulings, usually when they are especially news-worth. In those cases (think powerful politicians) there was always someone interested in getting them removed, but I don't recall whether there were official requests for redactions. However, we very intentionally do not copy all court rulings from official court databases, because they are known to be riddled with personal data. JurisWiki, a project from an experienced lawyer and free knowledge advocate of Italy (Simone Aliprandi), had to shut down for such issues after importing "just" 400k court rulings.

In our case, the source is an independent mirror of a government website that is now selectively removing documents. While Wikimedia projects have long preserved public domain government documents after originals were taken down or censored, I am unsure how Wikisource communities have handled this scenario in practice. Are mirrored datasets acceptable when the original public source has been altered or removed? How should we document provenance and authenticity for future readers?

I would say that relying on a mirror is *better* than using an official source, because you can have an additional layer of vetting, just like we do with PGDP.

Are you in contact with the people in that database? Are they going to be responsive when you find out personal data that failed to be redacted? (This is a "when", not an "if". It's certain to happen.)

What's the added benefit that a Wikisource copy would bring to that project? Find out, and focus on that. (Does it really need a comprehensive copy?)

If we proceed, how should we structure this corpus so the project remains usable? Are there recommended practices for:
– titling, metadata, and Wikidata integration for legal documents,

Wikidata should be immediately ruled out as it cannot stand this volume of documents.

As for titles, categories etc., you should probably talk with Chinese practitioners who can tell you how people usually search these documents.

Say the rulings are organised in tidy partitions of 100 different provinces (I'm inventing) and people usually search within each of them, then you can use those as prefixes and it will be easy to disambiguate.

– organizing millions of pages so they do not overwhelm categories and search,
– mitigating strain on job queues, dumps, and indexing,

This part I would say don't worry too much about, as WMF will let you know if it becomes a problem. Maybe don't come up with exceedingly esoteric templates and don't rely on DynamicPageList or other extensions known to be slow.


4. Political and archival importance
Wikisource has historically preserved documents at risk of censorship or disappearance, whether due to authoritarian restrictions or institutional neglect. Do other communities have experience with politically sensitive archival projects where the preservation value itself was a central motivation?

Yes, see above, but not at this scale.

Best,
     Federico

_______________________________________________
Wikisource-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to