Another thing I would be very happy to see in the future is a greater, systematic collaboration with Internet Archive. I'm convinced that it's a vital part of our ecosystem, because it allow easily a lot of things that should be done by skilled users (like create a PDF/djvu, OCR, etc). When a I explain Wikisource I always explain Internet Archive first, teaching people to upload there their files, then into Commons/Wikisource via the "IA Upload" tool.
This is why the Italian Wikisource community created a dedicated collection on IA: https://archive.org/details/itwikisource To create a collection, you need at least 50 items, and then you can ask Internet Archive to give you permission. Right now, Alex brollo is writing some scripts that will allow a better maintenance of the metadata, we'll share them when they are ready. If you create a collection, please tell us: we could even have a greater "Wikisource" collection, that contains all the linguistic collections. Maybe this is a bit OT for the strategy, but I think it suggests way to improve the collaboration between us and IA. On Fri, Mar 24, 2017 at 10:50 AM, Andrea Zanni <[email protected]> wrote: > Anyone else? > It would be very good to know the gist of the discussions/opinions you are > having in your local Wikisource. > > The Italian Wikisource for example is summing this up here: > https://meta.wikimedia.org/wiki/Strategy/Wikimedia_ > movement/2017/Sources/Italian_Wikisource_Village_pump > > For us, there is a bit of a disagreement about the idea and goal of being > a "library", and being a "typography": being a library is more focused on > access, on services build upon texts (text analysis, text mining, > searching, hyperlinking, annotation) and the transcribing/proofreading > part, which needs a whole different level of tools and interface. > > Maybe you are having a similar discussion? > Do you possibly see a "fork", in the future, of Wikisource in 2 different > projects, or at least 2 different interfaces? > > Aubrey > > On Mon, Mar 20, 2017 at 10:54 PM, Andrea Zanni <[email protected]> > wrote: > >> @Micru: of course, as you say, machine learning is the elephant in the >> room. >> I dream of something we could call "Wikisource as a platform": >> meaning an environment with structured data and workflows where you can >> have APIs >> and tools for interact with humans and machines, both for input and for >> output. >> We could have OCR software that learn from our human proofreaders, and >> ideally we could >> even have OCRs tailored for determined centuries or types of books. >> We could ue machine learning to look for citations within books (for >> example other cited books or authors).¹ >> This could improve heavily our library: >> on Internet Archive or Google Books we have millions of books that just >> wait for us to make them >> readable and accessible, and, of course, connect them to Wikipedia, to >> Wikidata, to other Wikisource books. >> >> IMHO, this is obviously important for GLAMs: >> we could be much more usable and easy for libraries, archives and museums >> that want to upload into Wikisource their texts and books, and make them >> part of our hyperlinked library. >> They could import easily on Wikisource, and could export as well. >> Now, this is impossible or at least very very difficult.² >> >> I'm not sure that all these features could go in just one project, but >> it's probably worth trying. >> >> Aubrey >> >> [1] I remember I explored the idea with Amir, but I couldn't follow up. >> [2] To get all the data I needed from Wikisource books, I had to >> basically scrape the website. >> >> On Mon, Mar 20, 2017 at 8:14 PM, Pine W <[email protected]> wrote: >> >>> Glad to see this discussion. Pinging Alex Stinson for this discussion in >>> case he has any insights to add from a GLAM perspective. >>> >>> Pine >>> >>> >>> On Mon, Mar 20, 2017 at 7:48 AM, David Cuenca Tudela <[email protected]> >>> wrote: >>> >>>> On Sun, Mar 19, 2017 at 9:44 PM, Asaf Bartov <[email protected]> >>>> wrote: >>>> >>>>> what might be the significant role our unique advantage might play in >>>>> 15 years? >>>>> >>>> >>>> There are some circumstantial aspects that might be relevant for >>>> Wikisource: >>>> - With the emergence of machine learning, do volunteers really need to >>>> spend so much time formatting? Or will we able to use our data to train a >>>> system to do some pre-formatting for us? >>>> - With the existing flood of data, can we consider ws as a relevancy >>>> setter? If a document has been transcribed/imported into wikisource, is >>>> that enough to make the document relevant? >>>> - Considering that not all libraries might have the resources to >>>> develop their own platform, can Wikisource be used as a neutral platform by >>>> external agents as a complement to their own infrastructure? >>>> >>>> Regarding the 15 years time frame, it might be a good exercise to >>>> examine different scenarios. Yes, one could be to think big, to expect >>>> growth and a favorable environment. But what about the opposite? What if >>>> there are *less* people able to contribute? >>>> >>>> Cheers, >>>> Micru >>>> >>>> >>>> _______________________________________________ >>>> Wikisource-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>>> >>>> >>> >>> _______________________________________________ >>> Wikisource-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>> >>> >> >
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
