Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-13 Thread Andrea Zanni
On Wed, Jun 12, 2013 at 4:47 PM, Aarti K. Dwivedi ellydwivedi2...@gmail.com
 wrote:

 If I am not wrong, as of today, most books that were born digital, are
 still under copyright. Of course, they are available freely on the
 internet. But we can't use the pirated copies. How would we go about the
 procurement of these books?
 If we procure these copyrighted books, then the only we would have to do
 is to check for proper formatting. Isn't it?


You are thinking of *books*, which are not the only documents Wikisource
can host.
For example, I am thinking about Open Access literature, which counts in
hundred thousands CC-BY licensed articles, for example.
Just look in DOAJ: http://www.doaj.org/

One of the wikimedians most involved in Open Access - Wiki collaboration is
Daniel Mietchen (cc'ed).
He's working on a bot who could grab the XML/HTML of an online article,
format it in wikicode, and post it wherever he wants (maybe, Wikisources).
The bot is aming to download automatically all images within the articles,
and post them on Commons.

I personally think that this project is beyond awesomeness,
IF we manage to solve particular and specific issues (as converting
hyperlinks to other articles in wikilinks to those articles posted on
WIkisource...)

As I said before, I see Wikisource as a broad, international, connected,
hypertextual digital library,
which has a thing no other digital library in the world has: a dedicated
community[*].

It is my personal opinion, I know some people don't see it that way (like
Alex :-D)


Aubrey

[*] there is Project Gutenberg, but I would argue they are not a digital
library...
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-12 Thread Andrea Zanni
On Wed, Jun 12, 2013 at 1:32 PM, billinghurst billinghu...@gmail.comwrote:

 If you are talking about how we represent digitally prepared text with the
 validation process. I would have no issue with the text being ripped and
 having a bot run through and taking it straight to level 4 (green), and
 then redefining green to say validated, or digitally prepared text not
 requiring validation.

 At the same time, if someone proposed and generates a fifth colour to
 represent digitally prepared text not requiring proofreading, then I will
 be happy with that. It may make someone happier in being a truer
 representation, but in the end to me it is a moot point. In the end, each
 of those is a local community decision, though one that should be made in
 consideration of how the other wikis interpret their processes.


Thanks for clarifying this.
I agree with you, and would welcome both solutions.

But a lot of wikisourcerors don't think this way,
so better discuss :-)

Aubrey
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-12 Thread Andrea Zanni
On Wed, Jun 12, 2013 at 2:32 PM, Thibaut Horel thibaut.ho...@gmail.comwrote:

 3. The current system with 4 quality levels to represent the proofreading
 state of a page is not sufficient to represent the diversity of
 proofreading scenarios. Indeed, there is a distinction to make between the
 *correctness* of the text and its *formatting*. In the case of a scanned
 edition which has been OCRed, we do need several passes before reaching a
 satisfying level of confidence about the correctness of the text as well as
 a suitable formatting (proper use of the wikicode, etc.). For digital-born
 documents however, as billinghurst said, we can automatically assume that
 the extracted text is correct, but that still doesn't mean that the text is
 correctly formatted and ready to be transcluded in the main namespace.
 Maybe we should add another level meaning text is correct, still needs
 formatting? Ideally, we should have to scales of quality levels: one
 dealing with the correctness of the text, and one dealing with its
 formatting. This would probably be too heavy and confusing though...


I couldn't agree more.
I think this could be an opportunity also to make task *smaller* and
*clearer*
(in the direction of microtask, which are contributions in crowdsourcing
projects which are small, definite and simple. eg GalaxyZoo, reCAPTCHA).

We could define some tasks as
* corrected the page
* proofread the text
* formatted the page
* validated the formatting
* OPTIONAL added optional templates/links/annotations
*...

We could even have qualifiers (all/part of the page, ...)

Is this idea crazy, or somewhat doable?

Aubrey
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-12 Thread David Cuenca
I think everything is doable, the problem is how to do it without
cluttering the interface and keeping things simple.

Some levels might be redundant and we could take the chance to think if
they are really necessary.

Some proposed changes:
- Proofread page levels: Unused, Proofread, Proofread with format,
Validated (the unused level would mean: pages with no text, ocr text,
pages with irrelevant content).
- All pages would be created at start with the extracted ocr text at
unused level, so finally search engines could also find our texts even if
they are not started yet
- A checkbox list to tag pages: damaged scan, missing scan, contains
media (image, score, etc)
- Color codes: like now plus orange for Proofread with format. Page with
tags would affect the color too. damaged would make the color half purple
and half the corresponding proofread level color, contains media could
add a (black?) square around the page number
- Proofread book levels should be automatic to the lowest page level, plus
two options, one to mark the book as ready to export and another one to
mark it as digital source, which would bring all pages at proofread
level.

For the metadata interface I keep thinking about it, and my impression is
that we should start working from Template:Book [1] until having a version
that can be used across Commons, Index pages, and books without supporting
scans (in this last case it could be the same header template with an
option to expand it to show the whole template:book).
That template also might need some coloring/reorganizing to reflect the
Work/Edition distinction that Wikidata is bringing [2]
And if with Lua it is possible to read/write Wikidata, then the possible
migration towards a Wikidata-powered Wikisource shouldn't be that far away.

Cheers,
Micru

[1] http://commons.wikimedia.org/wiki/Template:Book
[2] http://www.wikidata.org/wiki/Wikidata:Books_task_force


On Wed, Jun 12, 2013 at 8:48 AM, Andrea Zanni zanni.andre...@gmail.comwrote:


 On Wed, Jun 12, 2013 at 2:32 PM, Thibaut Horel thibaut.ho...@gmail.comwrote:

 3. The current system with 4 quality levels to represent the proofreading
 state of a page is not sufficient to represent the diversity of
 proofreading scenarios. Indeed, there is a distinction to make between the
 *correctness* of the text and its *formatting*. In the case of a scanned
 edition which has been OCRed, we do need several passes before reaching a
 satisfying level of confidence about the correctness of the text as well as
 a suitable formatting (proper use of the wikicode, etc.). For digital-born
 documents however, as billinghurst said, we can automatically assume that
 the extracted text is correct, but that still doesn't mean that the text is
 correctly formatted and ready to be transcluded in the main namespace.
 Maybe we should add another level meaning text is correct, still needs
 formatting? Ideally, we should have to scales of quality levels: one
 dealing with the correctness of the text, and one dealing with its
 formatting. This would probably be too heavy and confusing though...


 I couldn't agree more.
 I think this could be an opportunity also to make task *smaller* and
 *clearer*
 (in the direction of microtask, which are contributions in crowdsourcing
 projects which are small, definite and simple. eg GalaxyZoo, reCAPTCHA).

 We could define some tasks as
 * corrected the page
 * proofread the text
 * formatted the page
 * validated the formatting
 * OPTIONAL added optional templates/links/annotations
 *...

 We could even have qualifiers (all/part of the page, ...)

 Is this idea crazy, or somewhat doable?

 Aubrey

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




-- 
Etiamsi omnes, ego non
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-12 Thread Aarti K. Dwivedi
If I am not wrong, as of today, most books that were born digital, are
still under copyright. Of course, they are available freely on the
internet. But we can't use the pirated copies. How would we go about the
procurement of these books?
If we procure these copyrighted books, then the only we would have to do is
to check for proper formatting. Isn't it?


On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson l...@aronsson.se wrote:

 On 06/12/2013 02:48 PM, Andrea Zanni wrote:

 We could define some tasks as
 * corrected the page
 * OPTIONAL added optional templates/links/annotations
 *...


 Geotagged all the photos, ...

 The list doesn't end. You need a generic mechanism
 for any new feature you can invent. But aren't our
 existing templates and categories the best way to
 do this? You could just add to each page:
 {{done|proofread=user1|**validated=user2|geotagged=**user4|...}}


 --
   Lars Aronsson (l...@aronsson.se)
   Project Runeberg - free Nordic literature - http://runeberg.org/




 __**_
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l




-- 
Aarti K. Dwivedi
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-12 Thread Alex Brollo
When we tried to convert into wiki code (a needed step to add links and to
convert files into a wiki hypertext) a pdf file, that's a opaque, closed
format, such a work turned off in a nightmare. If we simply load free pdf
books as they are, I don't see any advantage, but feed wikisource
numbers/statistics nd this in presently far from my personal interest.

As you guess, I'm one of users who don't support Aubrey's enthusiasm about
 texts born digital, even if free. :-)

Alex


2013/6/12 David Cuenca dacu...@gmail.com

 Nobody is saying anything about using copyrighted works, there are many
 books that have an open license that would allow to include them in
 Wikisource.

 For instance in ca-ws we have this translation from 2009:

 http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%282009%29.djvu

 The original is in the PD, and the translator gave away his rights. It
 would have been much easier to work directly with the pdf, instead of
 converting to djvu.

 Micru


 On Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi 
 ellydwivedi2...@gmail.com wrote:

 If I am not wrong, as of today, most books that were born digital, are
 still under copyright. Of course, they are available freely on the
 internet. But we can't use the pirated copies. How would we go about the
 procurement of these books?
 If we procure these copyrighted books, then the only we would have to do
 is to check for proper formatting. Isn't it?


 On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson l...@aronsson.se wrote:

 On 06/12/2013 02:48 PM, Andrea Zanni wrote:

 We could define some tasks as
 * corrected the page
 * OPTIONAL added optional templates/links/annotations
 *...


 Geotagged all the photos, ...

 The list doesn't end. You need a generic mechanism
 for any new feature you can invent. But aren't our
 existing templates and categories the best way to
 do this? You could just add to each page:
 {{done|proofread=user1|**validated=user2|geotagged=**user4|...}}


 --
   Lars Aronsson (l...@aronsson.se)
   Project Runeberg - free Nordic literature - http://runeberg.org/




 __**_
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Aarti K. Dwivedi


 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Etiamsi omnes, ego non
 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l


___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-11 Thread Thomas PT
Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it 
really needed to create index pages to store the same data as Wikidata?
As I see the things, we'll have bibliographical metadata on Wikidata (title, 
author, date of publication...) and data related to proofreading (proofreading 
level, table of content...) on the Index: pages. More, as the Proofread Page 
extension considers that an Index page is about a scan (ie one or more files) 
I'm not sure that Index pages about books without scan will be managed well by 
the extension.

{{header|index name}} is already done, for books with scan, by the Proofread 
Page extension with the header=1 feature. In fr Wikisource, we already use a 
Lua module to manage the Mediawiki:Proofreadpage_header_template template used 
by the header=1 feature. https://fr.wikisource.org/wiki/Module:Header_template 
This template outputs automatically metadata and navigation from the index page 
TOC (but it allows also to override data).

Tpt

Date: Tue, 11 Jun 2013 01:33:39 +0200
From: alex.bro...@gmail.com
To: wikisource-l@lists.wikimedia.org
Subject: Re: [Wikisource-l] About texts without supporting files and
Index: pages

I'm going to test what you are telling in a real Lua script; as you know, Lua 
can read the code of any page with one expensive server function only, so 
that a simple {{header|index name}} ns0 template call could read all the wiki 
code from index page, parse it, extract all its data content, and use it to 
build any html you like. No other field is needed. In it.wikisource we are 
testing something more complex, since we are exporting Index data into a local 
Lua data module, to be loaded with a mw.loadData function that is not listed  
as server-expensive; but I presume that wiki servers would not be overloaded 
by one server expensive call

If Im not going wrong, such a script could be written tomorrow by a good Lua 
programmer I'll need some more time as a beginner.  I'll test a 
MediaWiki:Proofreadpage_index_template Lua loader  parser working into ns0, 
just to see if all runs as I guess, then I'll tell you in this thread. In which 
wikisource project do you work usually?

Alex


2013/6/11 David Cuenca dacu...@gmail.com

No, it won't be stored in Wikisource, but still there is the need to present 
the information in a consistent manner.

If you want to display the information on ns0, you will end up needing the same 
fields that the Index: page is using now. 


So why not to have the same solution for both? 

It could also be a template with a reduced set of fields that expands to show 
Template:Book with linked data from Wikidata, no matter if they have 
supporting scans or not.




Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.bro...@gmail.com wrote:



Simply there is no need to store data twice or more, if they are dinamically 
imported from wikidata. Such data would be simply generated by a normal 
template. Something similar to Commons media sharing: most wikipedians but 
beginners know that when you want to edit a shared media file, you must do you 
edit in Commons; there's no need to host a media file locally. 




So, IMHO a good Lua wikidata-reading library could avoid at all to store data 
in wikisource, or wikipedia, or Commons. 
Alex




2013/6/10 David Cuenca dacu...@gmail.com




@Alex: but what do you think of storing the source information in Index: 
pages for all works stored in Wikisource, even if they don't have a supporting 
scan?

That was the original question :)







About your proposed library, it would be more useful if it could modify data in 
Wikidata, not only import it. Besides, if the Wikidata client is installed in 
Wikisource, the inclusion syntax already takes care of displaying data...







Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.bro...@gmail.com wrote:



I don't see the need to change deeply Index/ns0 relationship, while I 
appreciate the idea promote coherence reducing redundance (many years ago I 
painfully used dBase III - dBase IV and I learned that principle by try and 
learn).







Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a 
brief message about relationship among wikidata, commons, wikisource and any 
other project. Don't follow the link, it's so short that I copy it here (but if 
you like it, comment it there):







Scribunto-Lua and WikidataI'd like a library to get Wikidata content; it would 
be a good idea IMHO to access to Wikidata data in plain form, just as such data 
would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)







If such a Lua library could be built, to import data from wikidata would be as 
simple, as writing a template, and data will be self-aligned. 

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2...@gmail.com







Hi,
There was a thread some time ago where there were talks of having books 
which were born digital. These pages wouldn't have scans

Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-11 Thread Andrea Zanni
@aarti: sometimes some books/text/documents are born-digital.
Think about all the scientific literature, or Phd thesis. These files (if
cc-by/sa licensed) could be stored in Wikisource, and be useful for the
wikicommunity.
We already have some means to link those text to their source (with a URL).

It's a long time controversy if we must or must not allow documents
without scans on Wikisource.
Every community should decide by itself. My personal POV (also as a
librarian), is that if we leave out born digital documents we are
forgetting the bulk of the stuff.
I think that one of the most important added values of Wikisource is
integrating texts with other Wikimedia projects, and (wiki)linking and
connecting each other.
No other digital library do that on the Internet, and we can do it because
we have a community.

So, these texts will have a source. I do think that proofreading a born
digital PDF is a waste of time.

Aubrey



On Tue, Jun 11, 2013 at 8:46 AM, Aarti K. Dwivedi ellydwivedi2...@gmail.com
 wrote:

 A slighly off-topic question: Even if we modify the extension to proofread
 books which do not have scans( I am assuming books that were born digital
 ), against what
 will these books be proofread?


 On Tue, Jun 11, 2013 at 12:11 PM, Thomas PT thoma...@hotmail.fr wrote:

 Sorry if my answer is off-topic but if metadata are stored in WIkidata,
 is it really needed to create index pages to store the same data as
 Wikidata?
 As I see the things, we'll have bibliographical metadata on Wikidata
 (title, author, date of publication...) and data related to proofreading
 (proofreading level, table of content...) on the Index: pages. More, as the
 Proofread Page extension considers that an Index page is about a scan (ie
 one or more files) I'm not sure that Index pages about books without scan
 will be managed well by the extension.

 {{header|index name}} is already done, for books with scan, by the
 Proofread Page extension with the header=1 feature. In fr Wikisource, we
 already use a Lua module to manage the
 Mediawiki:Proofreadpage_header_template template used by the header=1
 feature. https://fr.wikisource.org/wiki/Module:Header_template This
 template outputs automatically metadata and navigation from the index page
 TOC (but it allows also to override data).

 Tpt

 --
 Date: Tue, 11 Jun 2013 01:33:39 +0200
 From: alex.bro...@gmail.com
 To: wikisource-l@lists.wikimedia.org
 Subject: Re: [Wikisource-l] About texts without supporting files and
 Index: pages


 I'm going to test what you are telling in a real Lua script; as you know,
 Lua can read the code of any page with one expensive server function
 only, so that a simple {{header|index name}} ns0 template call could read
 all the wiki code from index page, parse it, extract all its data content,
 and use it to build any html you like. No other field is needed. In
 it.wikisource we are testing something more complex, since we are exporting
 Index data into a local Lua data module, to be loaded with a mw.loadData
 function that is not listed  as server-expensive; but I presume that wiki
 servers would not be overloaded by *one* server expensive call

 If Im not going wrong, such a script could be written tomorrow by a good
 Lua programmer I'll need some more time as a beginner.  I'll test
 a MediaWiki:Proofreadpage_index_template Lua loader  parser working into
 ns0, just to see if all runs as I guess, then I'll tell you in this thread.
 In which wikisource project do you work usually?

 Alex



 2013/6/11 David Cuenca dacu...@gmail.com

 No, it won't be stored in Wikisource, but still there is the need to
 present the information in a consistent manner.
 If you want to display the information on ns0, you will end up needing
 the same fields that the Index: page is using now.
 So why not to have the same solution for both?

 It could also be a template with a reduced set of fields that expands to
 show Template:Book with linked data from Wikidata, no matter if they have
 supporting scans or not.

 Micru


 On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.bro...@gmail.comwrote:

 Simply there is no need to store data twice or more, if they are
 dinamically imported from wikidata. Such data would be simply generated by
 a normal template. Something similar to Commons media sharing: most
 wikipedians but beginners know that when you want to edit a shared media
 file, you must do you edit in Commons; there's no need to host a media file
 locally.

 So, IMHO a good Lua wikidata-reading library could avoid at all to store
 data in wikisource, or wikipedia, or Commons.

 Alex


 2013/6/10 David Cuenca dacu...@gmail.com

 @Alex: but what do you think of storing the source information in
 Index: pages for all works stored in Wikisource, even if they don't have
 a supporting scan?

 That was the original question :)

 About your proposed library, it would be more useful if it could modify
 data in Wikidata, not only

Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-11 Thread Andrea Zanni
On Tue, Jun 11, 2013 at 8:41 AM, Thomas PT thoma...@hotmail.fr wrote:

 Sorry if my answer is off-topic but if metadata are stored in WIkidata, is
 it really needed to create index pages to store the same data as Wikidata?
 As I see the things, we'll have bibliographical metadata on Wikidata
 (title, author, date of publication...) and data related to proofreading
 (proofreading level, table of content...) on the Index: pages. More, as the
 Proofread Page extension considers that an Index page is about a scan (ie
 one or more files) I'm not sure that Index pages about books without scan
 will be managed well by the extension.

 I think that this is a matter of usability and user experience.
If we are going to use Index pages, we'll let users *stay on Wikisource*
the whole time, while the complexity and data workflow would be hidden to
them.
It's a *bad* thing to ask newbies to navigate through Wikisource (entry),
then Commons (file upload), the Wikisource(create Index page), then
Wikidata(fetch data), then Wikisource(start working on the book) again to
work on just a book.

For me this is one of the main obstacles to beginners, and we should try to
ease things for people, IMHO.

 Aubrey
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-11 Thread Alex Brollo
You're right Aubrey nevertheless while promoving a user friendly interface
the result is that data and wiki code is extremely difficult to use as a
clean data base. Think only to wiki markup and the simple trick to mark
bold and italic text with apostophes very user friendly, but something
like a nightmare for a poor programmer which needs to find the algorithm to
understand which apostophes are text and which are code. The server too
can't solve solve apostrophes concatenation. Was it less user friendly to
use something like b.../b? Yes; but how much cleaner raw wiki text
would be!

Distributed Proofreaders uses a completely different approach: there's a
rigid set of increasing abilitations for users, and unexperienced users can
do simple task only. This is far from wiki mentality, but we can't expect
to keep things too much easy.

Alex


2013/6/11 Andrea Zanni zanni.andre...@gmail.com

 On Tue, Jun 11, 2013 at 8:41 AM, Thomas PT thoma...@hotmail.fr wrote:

 Sorry if my answer is off-topic but if metadata are stored in WIkidata,
 is it really needed to create index pages to store the same data as
 Wikidata?
 As I see the things, we'll have bibliographical metadata on Wikidata
 (title, author, date of publication...) and data related to proofreading
 (proofreading level, table of content...) on the Index: pages. More, as the
 Proofread Page extension considers that an Index page is about a scan (ie
 one or more files) I'm not sure that Index pages about books without scan
 will be managed well by the extension.

 I think that this is a matter of usability and user experience.
 If we are going to use Index pages, we'll let users *stay on Wikisource*
 the whole time, while the complexity and data workflow would be hidden to
 them.
 It's a *bad* thing to ask newbies to navigate through Wikisource (entry),
 then Commons (file upload), the Wikisource(create Index page), then
 Wikidata(fetch data), then Wikisource(start working on the book) again to
 work on just a book.

 For me this is one of the main obstacles to beginners, and we should try
 to ease things for people, IMHO.

  Aubrey


 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l


___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-11 Thread billinghurst
On Tue, 11 Jun 2013 12:16:54 +0530, Aarti K. Dwivedi
ellydwivedi2...@gmail.com wrote:
 A slighly off-topic question: Even if we modify the extension to
proofread
 books which do not have scans( I am assuming books that were born
digital
 ), against what
 will these books be proofread?
 

I am not sure why we are looking to proofread a digital only file, unless
of course it never had a text layer and it had to be OCR'd.  Proofreading
surely only relates to scanned images where there has been the need to
proofread.

Regards, Billinghurst

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


[Wikisource-l] About texts without supporting files and Index: pages

2013-06-10 Thread David Cuenca
With the deployment of Wikidata it is a good moment to re-examine what
Index pages are and what should be their function.
The most direct transition to a Wikidata-supported Wikisource could be
something like this:
https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:
- to share data book data between Commons, Wikisource and Wikipedia
- to update it, when any of the sites has been updated
- to facilitate better search functions (like searches by author, or topic,
limiting the date range or the language)

That would only apply to those texts which use a Index: page, so now the
question is, what do we do with books that do not have supporting scans
(and therefore no index page)?

Some possible options:
a) ignore pages without sources and focus only on works with supporting
scans
b) use ns0 pages also as data containers (instead of, or in addition to
Index pages)
c) create Index: pages for all works, with or without scans. Use that
instead of Template:Textinfo

Personally I prefer option c, even if it would require to rename Index:
to Source: to make more clear what are those pages, however I would like
to hear the opinion of other wikisourcerors about this.

Cheers,
Micru
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-10 Thread David Cuenca
@Alex: but what do you think of storing the source information in Index:
pages for all works stored in Wikisource, even if they don't have a
supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify
data in Wikidata, not only import it. Besides, if the Wikidata client is
installed in Wikisource, the inclusion syntax already takes care of
displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.bro...@gmail.com wrote:

 I don't see the need to change deeply Index/ns0 relationship, while I
 appreciate the idea promote coherence reducing redundance (many years ago
 I painfully used dBase III - dBase IV and I learned that principle by try
 and learn).

 Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a
 brief message about relationship among wikidata, commons, wikisource and
 any other project. Don't follow the link, it's so short that I copy it here
 (but if you like it, comment it there):

 Scribunto-Lua and Wikidata
 I'd like a library to get Wikidata content; it would be a good idea IMHO
 to access to Wikidata data in plain form, just as such data would be Lua
 tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)


 If such a Lua library could be built, to import data from wikidata would
 be as simple, as writing a template, and data will be self-aligned.

 Alex


 2013/6/10 Aarti K. Dwivedi ellydwivedi2...@gmail.com

 Hi,

 There was a thread some time ago where there were talks of having
 books which were born digital. These pages wouldn't have scans.
 What the 'Index' page would have in these cases is something I am not
 very sure about.

 Cheers,
 Rtdwivedi


 On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacu...@gmail.com wrote:

 With the deployment of Wikidata it is a good moment to re-examine what
 Index pages are and what should be their function.
 The most direct transition to a Wikidata-supported Wikisource could be
 something like this:
 https://sites.google.com/site/dacuetu/BookData.pdf

 That would allow:
 - to share data book data between Commons, Wikisource and Wikipedia
 - to update it, when any of the sites has been updated
 - to facilitate better search functions (like searches by author, or
 topic, limiting the date range or the language)

 That would only apply to those texts which use a Index: page, so now
 the question is, what do we do with books that do not have supporting scans
 (and therefore no index page)?

 Some possible options:
 a) ignore pages without sources and focus only on works with supporting
 scans
 b) use ns0 pages also as data containers (instead of, or in addition to
 Index pages)
 c) create Index: pages for all works, with or without scans. Use that
 instead of Template:Textinfo

 Personally I prefer option c, even if it would require to rename
 Index: to Source: to make more clear what are those pages, however I
 would like to hear the opinion of other wikisourcerors about this.

 Cheers,
 Micru

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Aarti K. Dwivedi


 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




-- 
Etiamsi omnes, ego non
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-10 Thread Alex Brollo
Simply there is no need to store data twice or more, if they are
dinamically imported from wikidata. Such data would be simply generated by
a normal template. Something similar to Commons media sharing: most
wikipedians but beginners know that when you want to edit a shared media
file, you must do you edit in Commons; there's no need to host a media file
locally.

So, IMHO a good Lua wikidata-reading library could avoid at all to store
data in wikisource, or wikipedia, or Commons.

Alex


2013/6/10 David Cuenca dacu...@gmail.com

 @Alex: but what do you think of storing the source information in Index:
 pages for all works stored in Wikisource, even if they don't have a
 supporting scan?

 That was the original question :)

 About your proposed library, it would be more useful if it could modify
 data in Wikidata, not only import it. Besides, if the Wikidata client is
 installed in Wikisource, the inclusion syntax already takes care of
 displaying data...

 Micru


 On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.bro...@gmail.comwrote:

 I don't see the need to change deeply Index/ns0 relationship, while I
 appreciate the idea promote coherence reducing redundance (many years ago
 I painfully used dBase III - dBase IV and I learned that principle by try
 and learn).

 Here:
 http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a
 brief message about relationship among wikidata, commons, wikisource and
 any other project. Don't follow the link, it's so short that I copy it here
 (but if you like it, comment it there):

 Scribunto-Lua and Wikidata
 I'd like a library to get Wikidata content; it would be a good idea IMHO
 to access to Wikidata data in plain form, just as such data would be Lua
 tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)


 If such a Lua library could be built, to import data from wikidata would
 be as simple, as writing a template, and data will be self-aligned.

 Alex


 2013/6/10 Aarti K. Dwivedi ellydwivedi2...@gmail.com

 Hi,

 There was a thread some time ago where there were talks of having
 books which were born digital. These pages wouldn't have scans.
 What the 'Index' page would have in these cases is something I am not
 very sure about.

 Cheers,
 Rtdwivedi


 On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacu...@gmail.comwrote:

 With the deployment of Wikidata it is a good moment to re-examine what
 Index pages are and what should be their function.
 The most direct transition to a Wikidata-supported Wikisource could be
 something like this:
 https://sites.google.com/site/dacuetu/BookData.pdf

 That would allow:
 - to share data book data between Commons, Wikisource and Wikipedia
 - to update it, when any of the sites has been updated
 - to facilitate better search functions (like searches by author, or
 topic, limiting the date range or the language)

 That would only apply to those texts which use a Index: page, so now
 the question is, what do we do with books that do not have supporting scans
 (and therefore no index page)?

 Some possible options:
 a) ignore pages without sources and focus only on works with supporting
 scans
 b) use ns0 pages also as data containers (instead of, or in addition to
 Index pages)
 c) create Index: pages for all works, with or without scans. Use that
 instead of Template:Textinfo

 Personally I prefer option c, even if it would require to rename
 Index: to Source: to make more clear what are those pages, however I
 would like to hear the opinion of other wikisourcerors about this.

 Cheers,
 Micru

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Aarti K. Dwivedi


 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Etiamsi omnes, ego non
 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l


___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-10 Thread David Cuenca
No, it won't be stored in Wikisource, but still there is the need to
present the information in a consistent manner.
If you want to display the information on ns0, you will end up needing the
same fields that the Index: page is using now.
So why not to have the same solution for both?

It could also be a template with a reduced set of fields that expands to
show Template:Book with linked data from Wikidata, no matter if they have
supporting scans or not.

Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.bro...@gmail.com wrote:

 Simply there is no need to store data twice or more, if they are
 dinamically imported from wikidata. Such data would be simply generated by
 a normal template. Something similar to Commons media sharing: most
 wikipedians but beginners know that when you want to edit a shared media
 file, you must do you edit in Commons; there's no need to host a media file
 locally.

 So, IMHO a good Lua wikidata-reading library could avoid at all to store
 data in wikisource, or wikipedia, or Commons.

 Alex


 2013/6/10 David Cuenca dacu...@gmail.com

 @Alex: but what do you think of storing the source information in
 Index: pages for all works stored in Wikisource, even if they don't have
 a supporting scan?

 That was the original question :)

 About your proposed library, it would be more useful if it could modify
 data in Wikidata, not only import it. Besides, if the Wikidata client is
 installed in Wikisource, the inclusion syntax already takes care of
 displaying data...

 Micru


 On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.bro...@gmail.comwrote:

 I don't see the need to change deeply Index/ns0 relationship, while I
 appreciate the idea promote coherence reducing redundance (many years ago
 I painfully used dBase III - dBase IV and I learned that principle by try
 and learn).

 Here:
 http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a
 brief message about relationship among wikidata, commons, wikisource and
 any other project. Don't follow the link, it's so short that I copy it here
 (but if you like it, comment it there):

 Scribunto-Lua and Wikidata
 I'd like a library to get Wikidata content; it would be a good idea IMHO
 to access to Wikidata data in plain form, just as such data would be Lua
 tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)


 If such a Lua library could be built, to import data from wikidata would
 be as simple, as writing a template, and data will be self-aligned.

 Alex


 2013/6/10 Aarti K. Dwivedi ellydwivedi2...@gmail.com

 Hi,

 There was a thread some time ago where there were talks of having
 books which were born digital. These pages wouldn't have scans.
 What the 'Index' page would have in these cases is something I am not
 very sure about.

 Cheers,
 Rtdwivedi


 On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacu...@gmail.comwrote:

 With the deployment of Wikidata it is a good moment to re-examine what
 Index pages are and what should be their function.
 The most direct transition to a Wikidata-supported Wikisource could be
 something like this:
 https://sites.google.com/site/dacuetu/BookData.pdf

 That would allow:
 - to share data book data between Commons, Wikisource and Wikipedia
 - to update it, when any of the sites has been updated
 - to facilitate better search functions (like searches by author, or
 topic, limiting the date range or the language)

 That would only apply to those texts which use a Index: page, so now
 the question is, what do we do with books that do not have supporting 
 scans
 (and therefore no index page)?

 Some possible options:
 a) ignore pages without sources and focus only on works with
 supporting scans
 b) use ns0 pages also as data containers (instead of, or in addition
 to Index pages)
 c) create Index: pages for all works, with or without scans. Use
 that instead of Template:Textinfo

 Personally I prefer option c, even if it would require to rename
 Index: to Source: to make more clear what are those pages, however I
 would like to hear the opinion of other wikisourcerors about this.

 Cheers,
 Micru

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Aarti K. Dwivedi


 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Etiamsi omnes, ego non
 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 

Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-10 Thread Alex Brollo
I'm going to test what you are telling in a real Lua script; as you know,
Lua can read the code of any page with one expensive server function
only, so that a simple {{header|index name}} ns0 template call could read
all the wiki code from index page, parse it, extract all its data content,
and use it to build any html you like. No other field is needed. In
it.wikisource we are testing something more complex, since we are exporting
Index data into a local Lua data module, to be loaded with a mw.loadData
function that is not listed  as server-expensive; but I presume that wiki
servers would not be overloaded by *one* server expensive call

If Im not going wrong, such a script could be written tomorrow by a good
Lua programmer I'll need some more time as a beginner.  I'll test
a MediaWiki:Proofreadpage_index_template Lua loader  parser working into
ns0, just to see if all runs as I guess, then I'll tell you in this thread.
In which wikisource project do you work usually?

Alex



2013/6/11 David Cuenca dacu...@gmail.com

 No, it won't be stored in Wikisource, but still there is the need to
 present the information in a consistent manner.
 If you want to display the information on ns0, you will end up needing the
 same fields that the Index: page is using now.
 So why not to have the same solution for both?

 It could also be a template with a reduced set of fields that expands to
 show Template:Book with linked data from Wikidata, no matter if they have
 supporting scans or not.

 Micru


 On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.bro...@gmail.comwrote:

 Simply there is no need to store data twice or more, if they are
 dinamically imported from wikidata. Such data would be simply generated by
 a normal template. Something similar to Commons media sharing: most
 wikipedians but beginners know that when you want to edit a shared media
 file, you must do you edit in Commons; there's no need to host a media file
 locally.

 So, IMHO a good Lua wikidata-reading library could avoid at all to store
 data in wikisource, or wikipedia, or Commons.

 Alex


 2013/6/10 David Cuenca dacu...@gmail.com

 @Alex: but what do you think of storing the source information in
 Index: pages for all works stored in Wikisource, even if they don't have
 a supporting scan?

 That was the original question :)

 About your proposed library, it would be more useful if it could modify
 data in Wikidata, not only import it. Besides, if the Wikidata client is
 installed in Wikisource, the inclusion syntax already takes care of
 displaying data...

 Micru


 On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.bro...@gmail.comwrote:

 I don't see the need to change deeply Index/ns0 relationship, while I
 appreciate the idea promote coherence reducing redundance (many years ago
 I painfully used dBase III - dBase IV and I learned that principle by try
 and learn).

 Here:
 http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a
 brief message about relationship among wikidata, commons, wikisource and
 any other project. Don't follow the link, it's so short that I copy it here
 (but if you like it, comment it there):

 Scribunto-Lua and Wikidata
 I'd like a library to get Wikidata content; it would be a good idea
 IMHO to access to Wikidata data in plain form, just as such data would be
 Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)


 If such a Lua library could be built, to import data from wikidata
 would be as simple, as writing a template, and data will be self-aligned.

 Alex


 2013/6/10 Aarti K. Dwivedi ellydwivedi2...@gmail.com

 Hi,

 There was a thread some time ago where there were talks of having
 books which were born digital. These pages wouldn't have scans.
 What the 'Index' page would have in these cases is something I am not
 very sure about.

 Cheers,
 Rtdwivedi


 On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacu...@gmail.comwrote:

 With the deployment of Wikidata it is a good moment to re-examine
 what Index pages are and what should be their function.
 The most direct transition to a Wikidata-supported Wikisource could
 be something like this:
 https://sites.google.com/site/dacuetu/BookData.pdf

 That would allow:
 - to share data book data between Commons, Wikisource and Wikipedia
 - to update it, when any of the sites has been updated
 - to facilitate better search functions (like searches by author, or
 topic, limiting the date range or the language)

 That would only apply to those texts which use a Index: page, so
 now the question is, what do we do with books that do not have supporting
 scans (and therefore no index page)?

 Some possible options:
 a) ignore pages without sources and focus only on works with
 supporting scans
 b) use ns0 pages also as data containers (instead of, or in addition
 to Index pages)
 c) create Index: pages for all works, with or without scans. Use
 that instead of Template:Textinfo

 Personally I prefer option c, even if it