Hi everybody,

Here is my attempt at giving my point of view while trying to summarize
the discussion:

1. I think the role of Index: pages should be to present the *source* of
a work. This is true whether the source is a scanned edition (as is most
often the case at the moment), or a digital PDF (that is, containing
text and not images) as is the case for most "digital-born" documents. I
think it is good to have a neat separation between the original source
and how Wikisource presents the work in the main namespace. Indeed, even
if Wikisource tries to be as true as possible to the original content,
there are very often some changes in the way it is presented in the main
namespace.

2. Ideally, the metadata about the source of a work (author, date of
printing, etc.) should be located in Wikidata. But metadata related to
proofreading (e.g. the proofreading level of each individual page),
being specific to the mission of Wikisource, should be located in
Wikisource. How to do this while keeping the interface simple (i.e. hide
it from the user so that she doesn't have to go from Wikisource to
Wikidata to Wikisource) is a valid and very important concern, but is
also beyond my current understanding of Wikidata and its integration
into Wikimedia projects.

3. The current system with 4 quality levels to represent the
proofreading state of a page is not sufficient to represent the
diversity of proofreading scenarios. Indeed, there is a distinction to
make between the *correctness* of the text and its *formatting*. In the
case of a scanned edition which has been OCRed, we do need several
passes before reaching a satisfying level of confidence about the
correctness of the text as well as a suitable formatting (proper use of
the wikicode, etc.). For digital-born documents however, as billinghurst
said, we can automatically assume that the extracted text is correct,
but that still doesn't mean that the text is correctly formatted and
ready to be transcluded in the main namespace. Maybe we should add
another level meaning "text is correct, still needs formatting"?
Ideally, we should have to scales of quality levels: one dealing with
the correctness of the text, and one dealing with its formatting. This
would probably be too heavy and confusing though...

Thibaut (user:Zaran on Wikisource)

On 06/12/2013 01:35 PM, Andrea Zanni wrote:
>
> On Wed, Jun 12, 2013 at 1:32 PM, billinghurst <[email protected]
> <mailto:[email protected]>> wrote:
>
>     If you are talking about how we represent digitally prepared text
>     with the
>     validation process. I would have no issue with the text being
>     ripped and
>     having a bot run through and taking it straight to level 4
>     (green), and
>     then redefining green to say validated, or digitally prepared text not
>     requiring validation.
>
>     At the same time, if someone proposed and generates a fifth colour to
>     represent digitally prepared text not requiring proofreading, then
>     I will
>     be happy with that. It may make someone happier in being a truer
>     representation, but in the end to me it is a moot point. In the
>     end, each
>     of those is a local community decision, though one that should be
>     made in
>     consideration of how the other wikis interpret their processes.
>
>
> Thanks for clarifying this.
> I agree with you, and would welcome both solutions.
>
> But a lot of wikisourcerors don't think this way, 
> so better discuss :-)
>
> Aubrey
>
>
>
>
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to