Il 06/12/25 13:11, [email protected] ha scritto:
I would like to begin with some background, because many non-Chinese Wikimedia 
contributors may not be aware of how significant CJO has been for judicial 
transparency in China and how sharply access to it has been reduced in recent 
years.

Thanks for this context, it's super interesting!
For our purposes, the important point is this: CJO has removed or restricted 
access to large portions of its historical archive, including documents that 
were originally public, legally non-copyrightable under Chinese law, and 
crucial for understanding the functioning of China’s legal system. Many 
judgments that were once easily verifiable on the official site can no longer 
be checked against their original source. These documents are at risk of 
disappearing entirely from public access.

How strong is the presumption of copyright-ineligibility? What's the legal source for it and could it change in the future? (I'm clueless about the hierarchy of sources of law in China, sorry.)

Have other Wikisources hosted similarly massive, uniform corpora of government 
or legal documents? How did you determine whether they fit the mission of 
Wikisource? Were there concerns about overwhelming the project or changing its 
character?

Nothing as massive, but Italian Wikisource hosts court rulings, usually when they are especially news-worth. In those cases (think powerful politicians) there was always someone interested in getting them removed, but I don't recall whether there were official requests for redactions. However, we very intentionally do not copy all court rulings from official court databases, because they are known to be riddled with personal data. JurisWiki, a project from an experienced lawyer and free knowledge advocate of Italy (Simone Aliprandi), had to shut down for such issues after importing "just" 400k court rulings.

In our case, the source is an independent mirror of a government website that 
is now selectively removing documents. While Wikimedia projects have long 
preserved public domain government documents after originals were taken down or 
censored, I am unsure how Wikisource communities have handled this scenario in 
practice. Are mirrored datasets acceptable when the original public source has 
been altered or removed? How should we document provenance and authenticity for 
future readers?

I would say that relying on a mirror is *better* than using an official source, because you can have an additional layer of vetting, just like we do with PGDP.

Are you in contact with the people in that database? Are they going to be responsive when you find out personal data that failed to be redacted? (This is a "when", not an "if". It's certain to happen.)

What's the added benefit that a Wikisource copy would bring to that project? Find out, and focus on that. (Does it really need a comprehensive copy?)

If we proceed, how should we structure this corpus so the project remains 
usable? Are there recommended practices for:
– titling, metadata, and Wikidata integration for legal documents,

Wikidata should be immediately ruled out as it cannot stand this volume of documents.

As for titles, categories etc., you should probably talk with Chinese practitioners who can tell you how people usually search these documents.

Say the rulings are organised in tidy partitions of 100 different provinces (I'm inventing) and people usually search within each of them, then you can use those as prefixes and it will be easy to disambiguate.

– organizing millions of pages so they do not overwhelm categories and search,
– mitigating strain on job queues, dumps, and indexing,

This part I would say don't worry too much about, as WMF will let you know if it becomes a problem. Maybe don't come up with exceedingly esoteric templates and don't rely on DynamicPageList or other extensions known to be slow.


4. Political and archival importance
Wikisource has historically preserved documents at risk of censorship or 
disappearance, whether due to authoritarian restrictions or institutional 
neglect. Do other communities have experience with politically sensitive 
archival projects where the preservation value itself was a central motivation?

Yes, see above, but not at this scale.

Best,
        Federico
_______________________________________________
Wikisource-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to