Il 06/12/25 13:11, [email protected] ha scritto:
I would like to begin with some background, because many non-Chinese Wikimedia
contributors may not be aware of how significant CJO has been for judicial
transparency in China and how sharply access to it has been reduced in recent
years.
Thanks for this context, it's super interesting!
For our purposes, the important point is this: CJO has removed or restricted
access to large portions of its historical archive, including documents that
were originally public, legally non-copyrightable under Chinese law, and
crucial for understanding the functioning of China’s legal system. Many
judgments that were once easily verifiable on the official site can no longer
be checked against their original source. These documents are at risk of
disappearing entirely from public access.
How strong is the presumption of copyright-ineligibility? What's the
legal source for it and could it change in the future? (I'm clueless
about the hierarchy of sources of law in China, sorry.)
Have other Wikisources hosted similarly massive, uniform corpora of government
or legal documents? How did you determine whether they fit the mission of
Wikisource? Were there concerns about overwhelming the project or changing its
character?
Nothing as massive, but Italian Wikisource hosts court rulings, usually
when they are especially news-worth. In those cases (think powerful
politicians) there was always someone interested in getting them
removed, but I don't recall whether there were official requests for
redactions. However, we very intentionally do not copy all court rulings
from official court databases, because they are known to be riddled with
personal data. JurisWiki, a project from an experienced lawyer and free
knowledge advocate of Italy (Simone Aliprandi), had to shut down for
such issues after importing "just" 400k court rulings.
In our case, the source is an independent mirror of a government website that
is now selectively removing documents. While Wikimedia projects have long
preserved public domain government documents after originals were taken down or
censored, I am unsure how Wikisource communities have handled this scenario in
practice. Are mirrored datasets acceptable when the original public source has
been altered or removed? How should we document provenance and authenticity for
future readers?
I would say that relying on a mirror is *better* than using an official
source, because you can have an additional layer of vetting, just like
we do with PGDP.
Are you in contact with the people in that database? Are they going to
be responsive when you find out personal data that failed to be
redacted? (This is a "when", not an "if". It's certain to happen.)
What's the added benefit that a Wikisource copy would bring to that
project? Find out, and focus on that. (Does it really need a
comprehensive copy?)
If we proceed, how should we structure this corpus so the project remains
usable? Are there recommended practices for:
– titling, metadata, and Wikidata integration for legal documents,
Wikidata should be immediately ruled out as it cannot stand this volume
of documents.
As for titles, categories etc., you should probably talk with Chinese
practitioners who can tell you how people usually search these documents.
Say the rulings are organised in tidy partitions of 100 different
provinces (I'm inventing) and people usually search within each of them,
then you can use those as prefixes and it will be easy to disambiguate.
– organizing millions of pages so they do not overwhelm categories and search,
– mitigating strain on job queues, dumps, and indexing,
This part I would say don't worry too much about, as WMF will let you
know if it becomes a problem. Maybe don't come up with exceedingly
esoteric templates and don't rely on DynamicPageList or other extensions
known to be slow.
4. Political and archival importance
Wikisource has historically preserved documents at risk of censorship or
disappearance, whether due to authoritarian restrictions or institutional
neglect. Do other communities have experience with politically sensitive
archival projects where the preservation value itself was a central motivation?
Yes, see above, but not at this scale.
Best,
Federico
_______________________________________________
Wikisource-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]