Re: [Wikimedia-l] About the concentration of resources in SF (itwas: "Communication plans for community engagement"

2013-07-27 Thread Michał Buczyński
Hi,

+1 to this question.

If we learn that there are items where we are invited to the MediaWiki and some 
estimates how many e.g. developerdays we would need to finance so we know it is 
possible.

However, we should mind that most of the chapters are not really development 
houses and we are lacking experience in this area.

michał.
 
 28 lipca 2013 5:41 Craig Franklin  napisał(a):


> 
> > Hi Erik (and whomever from WMDE),
> 
> For the benefit of chapters that are interested in this space, can you
> offer any examples of projects that are of an appropriate size and type for
> a chapter to take on? I think that most chapters* would be willing to help
> out in the software development space if we got a bit of direction on how
> we could be the most useful.
> 
> Cheers,
> Craig Franklin
> 
> * Keeping in mind that my chapter probably wouldn't have the capacity to
> start anything in this space for at least another twelve months.
> 
> 
> On 27 July 2013 09:57, Erik Moeller  wrote:
> 
> > On Wed, Jul 24, 2013 at 2:39 PM, rupert THURNER
> >  wrote:
> >
> > > If WMF is serious about letting development activities grow in other
> > > countries this might be taken into account in FDCs allocation policy.
> >
> > For my part, I'm happy to offer feedback to the FDC on plans related
> > to the development of engineering capacity in FDC-funded
> > organizations. I'm sure Wikimedia Germany, too, would be happy to
> > share its experiences growing the Wikidata development team. I'd love
> > to find ways to bootstrap more engineering capacity across the
> > movement, as so many of our shared challenges have a software
> > engineering component. If any folks on-list want to touch base on
> > these questions at Wikimania, drop me a note. :)
> >
> > Erik
> >
> > --
> > Erik Möller
> > VP of Engineering and Product Development, Wikimedia Foundation
> >
> > ___
> > Wikimedia-l mailing list
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 








___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] About the concentration of resources in SF (it was: "Communication plans for community engagement"

2013-07-27 Thread Craig Franklin
Hi Erik (and whomever from WMDE),

For the benefit of chapters that are interested in this space, can you
offer any examples of projects that are of an appropriate size and type for
a chapter to take on?  I think that most chapters* would be willing to help
out in the software development space if we got a bit of direction on how
we could be the most useful.

Cheers,
Craig Franklin

* Keeping in mind that my chapter probably wouldn't have the capacity to
start anything in this space for at least another twelve months.


On 27 July 2013 09:57, Erik Moeller  wrote:

> On Wed, Jul 24, 2013 at 2:39 PM, rupert THURNER
>  wrote:
>
> > If WMF is serious about letting development activities grow in other
> > countries this might be taken into account in FDCs allocation policy.
>
> For my part, I'm happy to offer feedback to the FDC on plans related
> to the development of engineering capacity in FDC-funded
> organizations. I'm sure Wikimedia Germany, too, would be happy to
> share its experiences growing the Wikidata development team. I'd love
> to find ways to bootstrap more engineering capacity across the
> movement, as so many of our shared challenges have a software
> engineering component. If any folks on-list want to touch base on
> these questions at Wikimania, drop me a note. :)
>
> Erik
>
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Collaborative machine translation for Wikipedia -- proposed strategy

2013-07-27 Thread Laura Hale
On Saturday, July 27, 2013, David Cuenca wrote:

> On Fri, Jul 26, 2013 at 11:30 PM, C. Scott Ananian
> >wrote:
>
> > This statement seems rather defeatist to me.  Step one of a machine
> > translation effort should be to provide tools to annotate parallel texts
> in
> > the various wikis, and to edit and maintain their parallelism.
>
>
> Scott, "edit and maintain" parallelism sounds wonderful on paper, until you
> want to implement it and then you realize that you have to freeze changes
> both in the source text and in the target language for it to happen, which
> is, IMHO against the very nature of wikis.
> Translate:Extension already does that in a way. I see it useful only for
> texts acting as a central hub for translations, like official
> communication. If that were to happen for all kind of content you would
> have to sacrifice the plurality of letting each wiki to do their own
> version.
>
>
Actually, this sort of translation service might be extremely useful for us
on Wikinews.  We have a fair amount of direct cross translation work from
one language to the other.  Our articles generally become non-editable
after a short period of time because of the nature of news reporting.
 There are issues for things like original reporting where getting say
original Czech language reporting outside the major news stories that
international media can easily sell for syndication do not get reported.
 Thus more local news from minority languages being shared... yeah, big
benefit for us. :)  There might be a few Wikinews language projects that
would be willing to sign on as beta testers for a collaborative translating
tool. :)  I think one of our regulars, Gryllida, has been trying to develop
a tool to make translating easier so it would fit really well with existing
project goals.

Sincerely,
Laura Hale


-- 

-- 
mobile:   635209416
twitter: purplepopple
blog: ozziesport.com
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

2013-07-27 Thread Mark

On 7/27/13 10:29 AM, Denny Vrandečić wrote:

I still would worry, though: our content is increasing linearly, as you
say, but the number of active contributors is not. If we take for granted
that active contributors are the ones who provide quality control for the
articles, this means that since 2006 or so the ratio of content per
contributor is linearly declining, which would mean that our quality would
suffer.



One useful bit of information is what *kind* of editors there are, not 
just the raw numbers..


For example, here is a hypothetical situation, which I think James and 
John are contemplating, which would result in a numerical decline in 
editors-per-article with no real change in actual editorial attention to 
the article:


* Article in 2007, with 19 editors: Initial content written by 1 person, 
moderate expansions from 3 people, copyediting from 5 people, 
vandalism-rollback from 10 people


* Similar article in 2013, with 12 editors: Initial content written by 1 
person, moderate expansions from 3 people, copyediting from 3 people and 
1 typo-fixing bot, vandalism-rollback from 2 people and 2 anti-vandal bots


Basically all that happened in this hypothetical is that two of the 
typo-fixers were replaced by a typo-fixing bot, and 8 rollbacks that 
would've once been done by recent-changes patrollers were instead done 
by a smaller number of anti-vandal bots. Maybe that's not what the 
change looks like, but I don't think the raw edit-count data can tell us 
either way.


I think this is also a potential issue with the definition of active 
users, which is defined as 5 edits/month for "active" and 100 
edits/month for "very active". The latter in particular much more 
heavily favors people who make many smaller edits versus fewer large 
edits. And are there editors contributing substantial amounts of content 
to Wikipedia who don't even hit the lower threshold? One possible group 
are people whose main contribution is to write new articles, and do 
little to no other editing. Some people write offline and then 
contribute a new, well-referenced article in a single edit. If that's 
their only involvement in Wikipedia, they wouldn't be counted as active 
Wikipedians in the numbers, even if they're sending us a steady stream 
of 1-2 new articles/month.


I'm not sure how to best answer those questions automatically. Bytes, as 
James suggests, could be one possible proxy, but in addition to total 
bytes, we could look at the editor level. Has there been a decline in 
"active editors" if we define active editing as changing more than N 
bytes in the encyclopedia in a month, not counting rollbacks? That would 
count people who wrote substantial new articles as active, even if they 
did it in only 1 or 2 edits/month (although on the other hand, it 
wouldn't count people who made 100 rollbacks and no other edits).


Another possibility could be to sample a subset of either articles, or 
of editors, and manually annotate what kind of editing is going on. More 
tedious and would of necessity be on a small subset of the encyclopedia, 
but might avoid papering over things that are obvious when you look at 
them but tend to get lost in big-data analyses.


-Mark

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Feedback for the Wikimedia Foundation

2013-07-27 Thread Jane Darnell
I am using chrome and a pretty old computer. I am having trouble
getting my editting toolbar to go away again - probably cache
problems. I will try to reproduce this properly tomorrow, and
otherwise, scratch it up to RTFM

2013/7/27, Erik Moeller :
> On Sat, Jul 27, 2013 at 12:45 AM, Jane Darnell  wrote:
>
>> I am happy to report that I just discovered what the problem is. I had
>> turned off the "Show edit toolbar" option in my preferences (probably
>> over a year ago), so I wasn't seeing the top part of the VE edit
>> toolbar, which includes the hyperlink icon, among other things. I was
>> only seeing the other, second, line of the VE toolbar icons for
>> including media, reference, references list, and transclusion.
>
> That's interesting, Jane; thanks for the report. I'm not able to
> reproduce this - as far as I can tell, the preference is completely
> ignored by VE. If you or someone can get a repro on the exact
> circumstances under which this occurs, please drop me a note or
> directly add it to Bugzilla.
>
> Thanks,
> Erik
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] [Wikitech-l] Collaborative machine translation for Wikipedia -- proposed strategy

2013-07-27 Thread Samuel Klein
David - thanks for this proposal; it is something that deserves
attention, and our projects are already used as one of the raw sources
for machine translation efforts.

On Sat, Jul 27, 2013 at 10:18 AM, David Cuenca  wrote:
> On Fri, Jul 26, 2013 at 11:30 PM, C. Scott Ananian
> wrote:
>
>> Step one of a machine
>> translation effort should be to provide tools to annotate parallel texts in
>> the various wikis, and to edit and maintain their parallelism.

I agree with most of Scott's input here.

> Scott, "edit and maintain" parallelism sounds wonderful on paper, until you
> want to implement it and then you realize that you have to freeze changes
> both in the source text and in the target language for it to happen, which
> is, IMHO against the very nature of wikis.

You don't need to freeze changes - you need permalinks to revisions,
the ability to track linkages between [sentences] in rev A.n in
language A and those in rev B.m in language B, and three-way diffs.
All are tractable problems.

> Translate:Extension already does that in a way. I see it useful only for
> texts acting as a central hub for translations, like official
> communication. If that were to happen for all kind of content you would
> have to sacrifice the plurality of letting each wiki to do their own version.

Allowing for a plurality of versions is useful.  There's no special
reason to break this out by language (if anything, there should be one
version per major cultural group - groups with different definitions
of reliable sources, for instance - not per language).  We should
separate "plurality of branches of a document" from "synchronizing
translations of a given branch" where a single branch of a document
should be available in any language.

For instance, I may want to read a French translation of the "Russian
WP" version of articles related to the Sino-Soviet war, in English --
in addition to the "Japanese WP" version, and the native "French WP"
version.  We can reduce the difficulty of translating each branch by
noting their shared similarities -- especially if we track the
revision at which each branched from, or rebased to, a shared trunk.
Allowing translators to automatically capture the source-revision when
carrying out an update via translation, per-page or per-section, would
make this easier.

> The most popular statistical-based machine translation system has created
> its engine using texts extracted from *the whole internet*, it requires
> huge processing power, and that without mentioning the amount of resources

One can do better with less power with parallel corpora.   WP and
Wikisource provide some of the closest things to a collection of
parallel corpora -- anything we can do to further clarify how much
these documents are parallel, and to improve their parallelism, will
improve [free] machine translation tools greatly.

> Of course statistical-based approaches should also be used as well (point 8
> of the proposed workflow), however more as a supporting technology rather
> than the main one.

+1

> One single researcher can create working transfer rules for a language pair
> in 3 months or less if there is previous work (see these GsoC [1], [2],
> [3]). Whichever problem the translation has, it can be understood and
> corrected...  [and] lower the entry barrier for linguists and translators 
> alike,

Right.  It's much easier to get a rules-based system that is close
enough to be useful to human translators, to speed up their work and
lower the entry barrier for someone to start translating, than to do a
complete job with rules.

> that there is no need to "marry" a technology, several can be developed in
> parallel and broght to a point of convergence where they work together

+10

Warmly,
SJ

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Feedback for the Wikimedia Foundation

2013-07-27 Thread Erik Moeller
On Sat, Jul 27, 2013 at 12:45 AM, Jane Darnell  wrote:

> I am happy to report that I just discovered what the problem is. I had
> turned off the "Show edit toolbar" option in my preferences (probably
> over a year ago), so I wasn't seeing the top part of the VE edit
> toolbar, which includes the hyperlink icon, among other things. I was
> only seeing the other, second, line of the VE toolbar icons for
> including media, reference, references list, and transclusion.

That's interesting, Jane; thanks for the report. I'm not able to
reproduce this - as far as I can tell, the preference is completely
ignored by VE. If you or someone can get a repro on the exact
circumstances under which this occurs, please drop me a note or
directly add it to Bugzilla.

Thanks,
Erik

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Feedback for the Wikimedia Foundation

2013-07-27 Thread phoebe ayers
Right -- and if not that specifically, I'd imagine most experienced users
do have various hacks, scripts, gadgets etc installed that we've
accumulated over the years. I know many people who have been editing for a
long time have a custom skin as well.

I don't know how any of these might or might not affect VE performance, but
one thing about the VE being enabled for IPs too (at least on a few
wikipedias) is you can always log out and see if the same problem persists
:)

-- phoebe


On Sat, Jul 27, 2013 at 12:45 AM, Jane Darnell  wrote:

> I have tried and failed to use the Visual Editor several times in the
> past few weeks, and as with all new technologies, I consider myself a
> "follower" rather than a "leader",  so I was very interested to look
> up the Dutch feedback that Romaine was reporting. One of the comments
> was that it was impossible to create a simple blue link with the VE,
> since the VE throws  around any attempt to do this. Since that
> is one of the most basic parts of wikimarkup that anyone will use, I
> decided to investigate, since that was my problem too.
>
> I am happy to report that I just discovered what the problem is. I had
> turned off the "Show edit toolbar" option in my preferences (probably
> over a year ago), so I wasn't seeing the top part of the VE edit
> toolbar, which includes the hyperlink icon, among other things. I was
> only seeing the other, second, line of the VE toolbar icons for
> including media, reference, references list, and transclusion.
>
> I expect that many other experienced Wikipedians have the same
> problem. This should help solve a lot of the "ghost edits".
>
> 2013/7/26, David Gerard :
> > On 26 July 2013 03:12, Everton Zanella Alvarenga
> >  wrote:
> >
> >> Maybe a new community (less conservative?) to build a good
> >> encyclopedia can come up if a new platformn be invented?
> >
> >
> > Hence "power users" as a snarl word.
> >
> > After the uprising of the 17th of June
> > The Secretary of the Writers’ Union
> > Had leaflets distributed in the Stalinallee
> > Stating that the people
> > Had forfeited the confidence of the government
> > And could win it back only
> > By redoubled efforts. Would it not be easier
> > In that case for the government
> > To dissolve the people
> > And elect another?
> >
> >
> > - d.
> >
> > ___
> > Wikimedia-l mailing list
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>



-- 
* I use this address for lists; send personal messages to phoebe.ayers 
gmail.com *
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] About the concentration of resources in SF (it was: "Communication plans for community engagement"

2013-07-27 Thread aude
On Sat, Jul 27, 2013 at 1:57 AM, Erik Moeller  wrote:

> On Wed, Jul 24, 2013 at 2:39 PM, rupert THURNER
>  wrote:
>
> > If WMF is serious about letting development activities grow in other
> > countries this might be taken into account in FDCs allocation policy.
>
> For my part, I'm happy to offer feedback to the FDC on plans related
> to the development of engineering capacity in FDC-funded
> organizations. I'm sure Wikimedia Germany, too, would be happy to
> share its experiences growing the Wikidata development team. I'd love
> to find ways to bootstrap more engineering capacity across the
> movement, as so many of our shared challenges have a software
> engineering component. If any folks on-list want to touch base on
> these questions at Wikimania, drop me a note. :)
>
>
Chapters undertaking technology work is definitely a good thing!

I can say personally, unofficially (as member of the wikidata team) that I
am definitely happier working in Berlin (with lower salary, that goes
pretty far), versus SF. I am not convinced I could afford same lifestyle in
SF on salary offered by WMF.

Could one afford to live on their own in a 1b apartment in SOMA on WMF
salary, which has median cost of $3,475 [1] a month? Or would I need have
flatmate or need to commute from farther away?

The rule of thumb is that one should not spend more than 30% of their
income (after tax!), and ideally smaller percentage than that.  That
requires $11,500 (after tax) salary, per month.

I can very easily live on my own in the best parts of Berlin, near the WMDE
office, or whatever I want. Just sayin'  :)

I understand that lots of people like to live in SF anyway, even with
whatever sacrifices they must make to afford it.  And good that WMF offers
the remote work option.

[1] http://priceonomics.com/the-san-francisco-rent-explosion/

Cheers,
Katie



> Erik
>
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>



-- 
@wikimediadc / @wikidata
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

2013-07-27 Thread James Salsman
Denny Vrandečić wrote:
>...
> Is the graph  based on actual data?

Yes, the precise sizes for the
dumps.wikimedia.org/enwiki/MMDD/enwiki-MMDD-pages-articles-multistream.xml.bz2
files are:

2012-07-02 9524994664
2012-08-02 9824345489
2012-09-02 9929910893
2012-10-01 10015876877
2012-11-01 10124555675
2012-12-01 10220499338
2013-01-02 10315766966
2013-02-04 10425240648
2013-03-04 10430830645
2013-04-03 10433658645
2013-05-03 10525475953
2013-06-04 10617572833
2013-07-08 10721955835

The byte count approximations from multiplying columns 'E' and 'I'
from http://stats.wikimedia.org/EN/TablesWikipediaEN.htm are at the
end of this message. Again, that data best fits two linear trends,
with a cusp around 2006.

> our content is increasing... but the number of active
> contributors is not.

I'm becoming increasingly convinced that as contributors become more
experienced, they choose to do most of their work logged out. What are
the advantages of using a registered account? Theoretically you can
prove that you made contributions, but as far as I know only one
person so far has ever obtained professional credit for their
contributions (there is a recent thread on wiki-research-l about
this.) What are the disadvantages of using a registered account to
edit? Anyone who opposes an edit politically is likely to examine the
entirety of the editor's contribution history and will all too often
stalk, punish by reverting old edits, or dispute the contributor's
work. Anonymous IP editors rarely face such time wasting scrutiny and
hassles. For anyone whose primary goal is to build an encyclopedia as
opposed to socializing, amassing administrative power, or obtaining a
job with the Foundation, the choice is obvious.  Those who wish their
contributions to be remembered for posterity are more likely to become
serial puppeteers than registered editors, unless they want to spend
most of their time being hassled in article space.

John Vandenberg wrote:
>...
> I would love to see stats about quality rather than quantity

It would be a mistake to rely on volunteer or Foundation assessments
of quality, because the likelihood that they would be biased is far to
great. We should rely only on third party assessments of article
quality, such as those in
http://en.wikipedia.org/wiki/Reliability_of_Wikipedia#Assessments
nearly all of which show continuous ongoing improvement.

Automatic measures of quality proposed so far have not really
impressed me, but I think http://arxiv.org/pdf/1206.2517.pdf has huge
potential and I am confident that the ideas it promotes will be easily
automated by bots after it is proven through peer review.

> Does anyone have stats for the number of blocked users per month

Yes, but it's almost meaningless because the vast majority of blocks
are for persistent vandalism, often at schools or libraries where we
really have no way to determine whether the editors involved ever
returned to do productive work.

---

Products of columns 'E' and 'I' from
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm :

Jan-10 1133050
Dec-09 1126230
Nov-09 1120650
Oct-09 1078800
Sep-09 1072500
Aug-09 1065300
Jul-09 1026310
Jun-09 1021380
May-09 979160
Apr-09 971880
Mar-09 932850
Feb-09 930150
Jan-09 925020
Dec-08 885560
Nov-08 880620
Oct-08 841500
Sep-08 837500
Aug-08 831750
Jul-08 796080
Jun-08 794160
May-08 755780
Apr-08 749800
Mar-08 711260
Feb-08 706860
Jan-08 673890
Dec-07 669900
Nov-07 631800
Oct-07 625600
Sep-07 585960
Aug-07 582350
Jul-07 549900
Jun-07 518160
May-07 514080
Apr-07 479360
Mar-07 472480
Feb-07 466240
Jan-07 432000
Dec-06 425700
Nov-06 391720
Oct-06 387100
Sep-06 355160
Aug-06 351000
Jul-06 319560
Jun-06 289630
May-06 285670
Apr-06 255700
Mar-06 2476177000
Feb-06 2312907000
Jan-06 2170049000
Dec-05 201360
Nov-05 1869076000
Oct-05 174696
Sep-05 1627864000
Aug-05 1526784000
Jul-05 1407976000
Jun-05 1300334000
May-05 1209984000
Apr-05 1002925000
Mar-05 92463
Feb-05 87232
Jan-05 838272000
Dec-04 861724000
Nov-04 806195000
Oct-04 743904000
Sep-04 689924000
Aug-04 644502000
Jul-04 595665000
Jun-04 55290
May-04 511038000
Apr-04 47675
Mar-04 440286000
Feb-04 40301
Jan-04 375536000
Dec-03 350336000
Nov-03 329219000
Oct-03 310616000
Sep-03 294689000
Aug-03 27863
Jul-03 261555000
Jun-03 244454000
May-03 230328000
Apr-03 21720
Mar-03 20463
Feb-03 193475000
Jan-03 182936000
Dec-02 17101
Nov-02 16215
Oct-02 15048
Sep-02 80733000
Aug-02 6699
Jul-02 59755000
Jun-02 5542
May-02 49259000
Apr-02 4779
Mar-02 44968000
Feb-02 3935
Jan-02 30582000
Dec-01 26832000
Nov-01 21994000
Oct-01 17244000
Sep-01 10982000
Aug-01 710
Jul-01 4186000
Jun-01 324
May-01 2373600
Apr-01 1295800
Mar-01 596904
Feb-01 186636
Jan-01 33800

__

Re: [Wikimedia-l] About the concentration of resources in SF (it was: "Communication plans for community engagement"

2013-07-27 Thread Balázs Viczián
Well, both Hungary and Budapest aims to be the R&D center of the region.
There are multiple government and munipal funds and programmes plus a lot
of favouring policies on both administrative levels, including a full
dedicated neighbourhood on the bank of the Danube, named Infopark (since
1996 [1])

Setting up a formally for-profit company who's only contractor would be the
WMF (and/or other chapters) in BP can be funded well over 50% from non
movement funds (or low/no interest loans) during the first few years and
would be much much cheaper than any parts of Western Europe and most of the
CEE. Doing so though WMHU or a separate non-profit way - probaly also
doable.

However having one such department for the sake of having one is a total
waste of time, money and efforts everywhere in the World, so the main
question is: are there enough projects that could make establishing such a
department/spearate entity reasonable?

Balázs

[1] http://www.infopark.hu/lang/en/




> On Wed, Jul 24, 2013 at 2:39 PM, rupert THURNER
>  wrote:
>
> > If WMF is serious about letting development activities grow in other
> > countries this might be taken into account in FDCs allocation policy.
>
> For my part, I'm happy to offer feedback to the FDC on plans related
> to the development of engineering capacity in FDC-funded
> organizations. I'm sure Wikimedia Germany, too, would be happy to
> share its experiences growing the Wikidata development team. I'd love
> to find ways to bootstrap more engineering capacity across the
> movement, as so many of our shared challenges have a software
> engineering component. If any folks on-list want to touch base on
> these questions at Wikimania, drop me a note. :)
>
> Erik
>
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] [Wikitech-l] Collaborative machine translation for Wikipedia -- proposed strategy

2013-07-27 Thread David Cuenca
On Fri, Jul 26, 2013 at 11:30 PM, C. Scott Ananian
wrote:

> This statement seems rather defeatist to me.  Step one of a machine
> translation effort should be to provide tools to annotate parallel texts in
> the various wikis, and to edit and maintain their parallelism.


Scott, "edit and maintain" parallelism sounds wonderful on paper, until you
want to implement it and then you realize that you have to freeze changes
both in the source text and in the target language for it to happen, which
is, IMHO against the very nature of wikis.
Translate:Extension already does that in a way. I see it useful only for
texts acting as a central hub for translations, like official
communication. If that were to happen for all kind of content you would
have to sacrifice the plurality of letting each wiki to do their own
version.


> Once this
> is done, you have a substantial parallel corpora, which is then suitable to
> grow the set of translated articles.  That is, minority languages ought to
> be accounted for by progressively expanding the number of translated
> articles in their encyclopedia, as we do now.  As this is done, machine
> translation incrementally improves.


The most popular statistical-based machine translation system has created
its engine using texts extracted from *the whole internet*, it requires
huge processing power, and that without mentioning the amount of resources
that went into research and development. And having all those resources
they managed to create a system that sort of works.
Wikipedia doesn't have enough amount of text nor resources to follow that
route, and the target number of languages is even higher.
Of course statistical-based approaches should also be used as well (point 8
of the proposed workflow), however more as a supporting technology rather
than the main one.


> If there is not enough of an editor
> community to translate articles, I don't see how you will succeed in the
> much more technically-demanding tasks of creating rules for a rule-based
> translation system.  The beauty of the statistical approach is that little
> special ability is needed.


One single researcher can create working transfer rules for a language pair
in 3 months or less if there is previous work (see these GsoC [1], [2],
[3]). Whichever problem the translation has, it can be understood and
corrected. With statistics, you rely on bulk numbers and on the hope that
you have enough coverage, and that makes improving its defects even harder.
It is true that writing transfer rules is technically demanding, and so it
is writing mediawiki software, which keeps being developed anyways. After
seeing how their system works, I think there is room for simplifying
transfer rules (first storing them as mediawiki templates, then as linked
data, then having a user interface). That could lower the entry barrier for
linguists and translators alike, while enabling the triangulation of rules
between pairs that have a common one.

As said before, there is no single tool that can do everything, it is the
combination of them what will bring the best results. The good thing is
that there is no need to "marry" a technology, several can be developed in
parallel and broght to a point of convergence where they work together for
optimal results.

I appreciate that you took time to read the proposal :)

Thanks,
David

[1]
http://www.google-melange.com/gsoc/project/google/gsoc2013/akindalki/3001
[2]
http://www.google-melange.com/gsoc/project/google/gsoc2013/jcentelles/20001
[3]
http://www.google-melange.com/gsoc/project/google/gsoc2013/jonasfromseier/5001
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

2013-07-27 Thread John Vandenberg
On Sat, Jul 27, 2013 at 6:29 PM, Denny Vrandečić
 wrote:
> Thank you for the observation.
>
> Is the graph  based on actual data? Because
> it looks just tad bit too linear to me. (I do not disagree with the
> finding, just wondering about the graph itself).
>
> I still would worry, though: our content is increasing linearly, as you
> say, but the number of active contributors is not. If we take for granted
> that active contributors are the ones who provide quality control for the
> articles, this means that since 2006 or so the ratio of content per
> contributor is linearly declining, which would mean that our quality would
> suffer.

There are a few parts of this that I dont think it can be taken for
granted, and I would love to see stats about quality rather than
quantity, as you're talking about quality, and that should be a
significant component of our analysis.

1) 'active contributors are the ones who provide quality control'

   bots do a lot of what used to be done by humans back in 2007,
rolling back most silly edits.
   and it is a small subset of active contributors who do the majority
of the maintenance.

2) the number of active contributors _doing quality control_ has declined.

   we know the number of overall editors is declining, and I think you
are right that those doing quality control is declining, but is there
evidence to support it?  And does it support that this decline is a
problem?

My gut feeling is that the decline in 'quality control' edits is
tightly linked to the increase in bots doing quality control.

i.e. do we have research to support total article-to-editor ratio
having a bearing on average quality of content?
A proxy could be average number of references per article ..?

It seems unlikely, as our content over the last five years has
increased in quality, and our number of editors has declined.

> I see two effects to counter that:
>
> 1) as you already mentioned, contributors are getting increasingly more
> experienced and more effective in fulfilling their tasks.
>
> 2) we continue to have a strong increase in readers and even stronger in
> pageviews (i.e. more and more people consult Wikipedia more and more). They
> probably also provide a layer of quality assurance, even though they might
> not qualify to be counted as active contributors.
>
> I have the gut feeling that 1) cannot be sufficient, and I would be curious
> in the effects of 2) - especially considering that much of the Foundation
> development work can be considered in improving 2 further (visual editor,
> article rating, mobile editing, etc.)

I agree with James that (1) is significant, and (2 - 'the future')
brings many unknowns with it.

(1) consists of our entire potential editor base, which includes of
all our currently active editors, and all of our inactive editors who
are able to resume editing at any time - i.e. not blocked, not ^&%ed
off, etc.  They all know the syntax, and have demonstrated their
commitment to the vision, _and_ the writers have a personal connection
to the articles that they worked on.  I see lots of them come back
occasionally to touch up or expand their work.

(2) brings different editors, for good or ill.  There are some
concerns in the community that simplifying editing will bring more
non-trivial vandalism that bots cant handle, and even more good
meaning editors who are discouraged when they can't understand why
their edit has disappeared, because they dont read the history, the
talk pages, etc, etc.  The ratio of experienced editor to newbie could
be a significant factor in the maintenance of a friendly environment.

More is not always better.

Don't get me wrong; a good VE will be very helpful, and the projects
defensive mechanisms will adapt.  But I predict that if we see lots of
poor quality articles from VE, without adequate references, and the
community backlogs become problematic, the community will want develop
tools to limit new poor quality articles.

Does anyone have stats for the number of blocked users per month over
the years, as that is hurting our potential editor base, and number of
reverts of edits by new users.

--
John Vandenberg

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

2013-07-27 Thread Denny Vrandečić
Thank you for the observation.

Is the graph  based on actual data? Because
it looks just tad bit too linear to me. (I do not disagree with the
finding, just wondering about the graph itself).

I still would worry, though: our content is increasing linearly, as you
say, but the number of active contributors is not. If we take for granted
that active contributors are the ones who provide quality control for the
articles, this means that since 2006 or so the ratio of content per
contributor is linearly declining, which would mean that our quality would
suffer.

I see two effects to counter that:

1) as you already mentioned, contributors are getting increasingly more
experienced and more effective in fulfilling their tasks.

2) we continue to have a strong increase in readers and even stronger in
pageviews (i.e. more and more people consult Wikipedia more and more). They
probably also provide a layer of quality assurance, even though they might
not qualify to be counted as active contributors.

I have the gut feeling that 1) cannot be sufficient, and I would be curious
in the effects of 2) - especially considering that much of the Foundation
development work can be considered in improving 2 further (visual editor,
article rating, mobile editing, etc.)





2013/7/27 James Salsman 

> MZMcBride wrote:
> >... the number of non-deleted revisions per day for the
> > English Wikipedia. The results are here:
> > https://en.wikipedia.org/wiki/Special:Permalink/565971356
>
> So, that looks terrible: http://i.imgur.com/Z9lYCWj.png
>
> It looks terrible in the same way that every other graph of active
> users and several other related measures look like.
>
> But it isn't. It doesn't account for the power law of practice which
> causes everyone who has ever edited Wikipedia to get better at it with
> time. And since so many IP editors are obviously returning, that means
> a lot more than under the false but very common assumption that every
> IP editor is new.
>
> Here's what really matters, articlespace size:
> http://i.imgur.com/TfaD99V.png
>
> The size of the article text in bytes has been marching on linearly
> since the beginning of Wikipedia, with extremely low variation, just
> like the short popular vital articles and every other measure of
> quality content.
>
> There is no legitimate basis to worry about anything until the linear
> trend of the total article bytes breaks out of its 12 year linear
> trend.
>
> (If you multiply columns 'E' and 'I' from
> http://stats.wikimedia.org/EN/TablesWikipediaEN.htm the database size
> shows a cusp at around 2006, corresponding to the growth modes, but
> two separate linear trends fit both modes far better than any growth
> model fits the entire curve.)
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 




-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Feedback for the Wikimedia Foundation

2013-07-27 Thread Jane Darnell
I have tried and failed to use the Visual Editor several times in the
past few weeks, and as with all new technologies, I consider myself a
"follower" rather than a "leader",  so I was very interested to look
up the Dutch feedback that Romaine was reporting. One of the comments
was that it was impossible to create a simple blue link with the VE,
since the VE throws  around any attempt to do this. Since that
is one of the most basic parts of wikimarkup that anyone will use, I
decided to investigate, since that was my problem too.

I am happy to report that I just discovered what the problem is. I had
turned off the "Show edit toolbar" option in my preferences (probably
over a year ago), so I wasn't seeing the top part of the VE edit
toolbar, which includes the hyperlink icon, among other things. I was
only seeing the other, second, line of the VE toolbar icons for
including media, reference, references list, and transclusion.

I expect that many other experienced Wikipedians have the same
problem. This should help solve a lot of the "ghost edits".

2013/7/26, David Gerard :
> On 26 July 2013 03:12, Everton Zanella Alvarenga
>  wrote:
>
>> Maybe a new community (less conservative?) to build a good
>> encyclopedia can come up if a new platformn be invented?
>
>
> Hence "power users" as a snarl word.
>
> After the uprising of the 17th of June
> The Secretary of the Writers’ Union
> Had leaflets distributed in the Stalinallee
> Stating that the people
> Had forfeited the confidence of the government
> And could win it back only
> By redoubled efforts. Would it not be easier
> In that case for the government
> To dissolve the people
> And elect another?
>
>
> - d.
>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,