You're preaching to the converted. :-) I don't see a problem with persuading anyone of the significance of the data set. But I would still like to know of specifically interested researchers because that info is used to justify allocation of resources and because we need to know what kinds of indexes etc need to be established to assist the researchers which means we need to know a bit about the nature of their current or proposed research. If we can build a community of researchers around the data set, then we can use that community as the basis of a later NeCTAR bid or something along those lines to further enrich and develop in this space.
I am already talking with Rob Cook, CEO of QCIF about putting it onto QCloud http://www.qcif.edu.au/services/qcloud This would give us storage, a virtual machine to host a web server for access to the data, plus access to high performance computing to process the data (where required). I think the first steps are the storage for the dumps and web server to access them (so really just a local mirror). I have a guy working on a similar project to make the Trove newspaper data accessible to researchers so I am hoping that after he's had his baptism of fire learning about QCloud on that project, he'll be interested to do a small project (as WMAU doesn't have a lot of cash) to do the basics of a setup for the WP dumps. Then I think we really need a community of researchers as mentioned above to make decisions about preprocessing, indexes etc. Sent from my iPad On 11/01/2013, at 8:59 AM, Leigh Blackall <[email protected]> wrote: > Kerry, I think given the primacy that Wikipedia now has over information, the > data within it is of crucial importance to researchers investigating all > manor if things, from information and knowledge management, to social > studies, news and journalism and propaganda. For this reason, whether we know > of a number of researchers or not, whether our peak bodies for research and > data management are up on this or not, if we few see a reason and opportunity > to take data and process it now, we should. > > Then again, is there a good reason to act now? Isn't the data well managed > and preserved where it is now? If some day Australian researchers do > gravitate to that data, won't that be the time to develop processing programs? > > On Jan 11, 2013 8:07 AM, "Kerry Raymond" <[email protected]> wrote: > I just added this to the talk page > > I can certainly gather the names of some Australian researchers who would be > interested in this. But the size would make it a better target for an RDSI > node rather than AARNET. Researchers probably want more than just a mirrored > dump; they would want it extracted and pre-processed in a number of ways for > convenience in mining it in various ways. Most researchers who work with > WIkipedia dumps have to do extensive preprocessing so the desire to do it > once and share is definitely there. I am in conversation with an RDSI node > and the size doesn't seem to faze them, but we would need folks to volunteer > to help with preprocessing it.Kerry Raymond (talk) 20:59, 10 January 2013 > (UTC) > > So if you are an Australian researcher interested in getting this data set > easily accessible to Australian researchers, please let me know. also please > forward to any researcher friends you might have and ask them to contact me. > > I've previously supervised a phd student who used a 2007 dump (from memory) > and I am aware of other projects at QUT that used Wikipedia dumps in one way > or another. > > Kerry > > > Sent from my iPad > > On 10/01/2013, at 2:51 PM, Leigh Blackall <[email protected]> wrote: > >> In what ways should we speak up John? Letters to AARnet? >> >> On Jan 10, 2013 10:21 AM, "John Vandenberg" <[email protected]> wrote: >> ---------- Forwarded message ---------- >> From: "Federico Leva (Nemo)" <[email protected]> >> Date: Jan 10, 2013 10:11 AM >> Subject: [Wiki-research-l] Dumps on AARNet >> To: "Research into Wikimedia content and communities" >> <[email protected]> >> Cc: >> >> If you're a researcher and you'd like the Wikimedia projects dumps to be on >> AARNet, looks like you need to speak up. >> <https://meta.wikimedia.org/w/index.php?title=Talk:Mirroring_Wikimedia_project_XML_dumps&curid=316421&diff=5001005&oldid=5000757&rcid=3821246> >> >> Nemo >> >> _______________________________________________ >> Wiki-research-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> _______________________________________________ >> Wikimediaau-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikimediaau-l >> >> _______________________________________________ >> Wikimediaau-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikimediaau-l > > _______________________________________________ > Wikimediaau-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimediaau-l > > _______________________________________________ > Wikimediaau-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimediaau-l
_______________________________________________ Wikimediaau-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikimediaau-l
