[Wikitech-l] forking media files

2011-08-15 Thread Peter Gervai
Let me retitle one of the topics nobody seems to touch.

On Fri, Aug 12, 2011 at 13:44, Brion Vibber br...@pobox.com wrote:

 * media files -- these are freely copiable but I'm not sure the state of
 easily obtaing them in bulk. As the data set moved into TB it became
 impractical to just build .tar dumps. There are batch downloader tools
 available, and the metadata's all in dumps and api.

Right now it is basically locked: there is no way to bulk copy the
media files, including doing simply a backup of one wikipedia, or
commons. I've tried, I've asked, and the answer was basically to
contact a dev and arrange it, which obviously could be done (I know
many of the folks) but that isn't the point.

Some explanations were mentioned, mostly mentioning that media and its
metadata is quite detached, and thus it's hard to enforce licensing
quirks like attribution, special licenses and such. I can guess this
is a relevant comment since the text corpus is uniformly licensed
under CC/GFDL while the media files are at best non-homogeneous (like
commons, where everything's free in a way) and completely chaos at its
worst (individual wikipedias, where there may be anything from
leftover fair use to copyrighted by various entities to images to be
deleted soon).

Still, I do not believe it's a good method to make it close to
impossible to bulk copy the data. I am not sure which technical means
is best, as there are many competing ones.

We could, for example, open up an API which would serve media file
with its metadata together, possibly supporting mass operations.
Still, it's pretty ineffective.

Or we could support zsync, rsync and such (and I again recommend
examining zsync's several interesting abilities to offload the work to
the client), but there ought to be some pointers to image metadata, at
least an oneliner file with every image linking to the license page.

Or we could connect the bulk way to established editor accounts, so we
could have at least a bit of an assurance that s/he knows what s/he's
doing.

-- 
 byte-byte,
    grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] forking media files

2011-08-15 Thread Peter Gervai
On Mon, Aug 15, 2011 at 18:40, Russell N. Nelson - rnnelson
rnnel...@clarkson.edu wrote:
 The problem is that 1) the files are bulky,

That's expected. :-)

 2) there are many of them, 3) they are in constant flux,

That is not really a problem: since there are many of them
statistically they are not in flux.

 and 4) it's likely that your connection would close for whatever reason 
 part-way through the download..

I seem not to forgot to mention zsync/rsync. ;-)

 Even taking a snapshot of the filenames is dicey. By the time you finish, 
 it's likely that there will be new ones, and possible that some will be 
 deleted. Probably the best way to make this work is to 1) make a snapshot of 
 files periodically,

Since I've been told they're backed up it naturally should exist.

 2) create an API which returns a tarball using the snapshot of files that 
 also implements Range requests.

I would very much prefer ready-to-use format instead of a tarball, not
to mention it's pretty resource consuming to create a tarball just for
that.

 Of course, this would result in a 12-terabyte file on the recipient's host. 
 That wouldn't work very well. I'm pretty sure that the recipient would need 
 an http client which would 1) keep track of the place in the bytestream and 
 2) split out files and write them to disk as separate files. It's possible 
 that a program like getbot already implements this.

I'd make a snapshot without tar especially because partial transfers
aren't possible that way.

-- 
 byte-byte,
    grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XKCD: Extended Mind

2011-05-26 Thread Peter Gervai
On Thu, May 26, 2011 at 17:38, Leo Koppelkamm diebu...@gmail.com wrote:
 http://ryanelmquist.com/cgi-bin/xkcdwiki

Nice way to see that first sentences eventually lead to a general
quantity or property which links to [[property (phylosophy)]] which
links to Philosophy itself. So far I didn't see a way which wasn't
following 'property'.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XKCD: Extended Mind

2011-05-25 Thread Peter Gervai
On Wed, May 25, 2011 at 09:24, Tim Starling tstarl...@wikimedia.org wrote:
 On 25/05/11 17:05, Domas Mituzas wrote:
 On May 25, 2011, at 9:35 AM, K. Peachey wrote:

 http://xkcd.org/903/ -Peachey

 that error is fake! 10.0.0.242 is internal services DNS server and
 is not used to serve en.wikipedia.org - dberror log does not have a
 single instance of it! 10.0.6.42 on the other hand

 I would have thought the fact that it was hand drawn would have given
 it away.

But in this particular case hand drawn doesn't mean facts can slip. At
least these drawings are usually extremely precise. (You can see which
pulldowns he usually keep open. :-))

I second Domas to check because there may be a super secret conspiracy
and the drawing may be correct. ;-)
-- 
 byte-byte,
    grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XKCD: Extended Mind

2011-05-25 Thread Peter Gervai
On Wed, May 25, 2011 at 17:16, Domas Mituzas midom.li...@gmail.com

Thanks for clearing that up. Nice work.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] What is wrong with Wikia's WYSIWYG?

2011-05-02 Thread Peter Gervai
On Mon, May 2, 2011 at 10:02, Tim Starling tstarl...@wikimedia.org wrote:

 don't think using wikitext is the best way to make things easier for
 new users.

It's always been a dilemma for me to see how much amount of computer illiterate
users should be wished for (or actually possible to tolerate). I don't
feel that
web number dot number (whatever version we call it) is fast, reliable, useful
enough to be used as the main way to input encyclopedia text. These
usually very slow, and quite unreliable (including google docs stuff which is
I believe the most advanced tech out here in this topic).

And... People habitually completely get lost in DTP software (be that
[open/whatever]office or else), they can't comprehend formatting, fonts, text
annotation and other advanced features. I do not see that WYSIWYG would've
made them more able to use the techniques. There are some guys who actually
learned enough markup to completely screw up wikibooks (putting flashing 80pt
large fonts in scrolling frames with all kinds of - otherwise not
horrible on purpose - features
of CSS), and I just fear what they can do with wysiwig. Such texts are sometimes
just easier to completely reformat (reset ALL formatting to default
and start over).

Foundation's purpose is to make it easier for everyone and to invite
and involve
everyone, I know.  I just have my doubts and worries, which I have just shared.

Peter

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Version control

2010-02-07 Thread Peter Gervai
On Sun, Feb 7, 2010 at 00:38, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote:

 It's interesting that the #1 con against Git in that document is Lots
 of annoying Git/Linux fanboys.

No, it's the screaming 'hell yeah!' but have no idea what they're
talking about part. :-)

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google phases out support for IE6

2010-02-02 Thread Peter Gervai
On Tue, Feb 2, 2010 at 00:44, Gregory Maxwell gmaxw...@gmail.com wrote:

 People are really bad at complaining, especially web users.  We've had
 prolonged obvious glitches which must have effected hundreds of
 thousands of people and maybe we get a couple of reports.

For Average Joe and Jane it usually isn't obvious what to do when
something's broken. I've observed people using really broken websites
(fallen apart layout, broken menus) and never report but complain to
their colleagues. I second that people are bad at reporting problems,
and I must add that computer people are usually bad to get the
complaints and fix them. ;-) I guess if you have a problem and you
know someone who can do something about it, then it'll get fixed,
otherwise it _may_ get fixed, one day or other. [I've experienced this
latter problem regarding email config bugs and [not] having them
fixed.]

Nevertheless I wouldn't miss any IE features, but then again I'm an
anti m$ fascist by genetics. ;-)

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google phases out support for IE6

2010-01-31 Thread Peter Gervai
What about creating an monobo'oldies theme for them? I mean, move
current stuff to oldies, and drop elders support from monobook.

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikitext vs. WYSIWYG (was: Proposal for editing template calls within pages)

2009-09-24 Thread Peter Gervai
On Thu, Sep 24, 2009 at 14:36, David Gerard dger...@gmail.com wrote:

 However, impenetrable wikitext is one of *the* greatest barriers to
 new users on Wikimedia projects. And this impenetrability is not, in
 any way whatsoever or by any twists of logic, a feature.

Adding a gui layer to wikitext is always okay, as long as it's
possible to get rid of, since majority of edits not coming from new
users, and losing flexibility for power users to get more newbies
doesn't sound like a good deal to me.

At least all of the GUIs I've seen were slow and hard to use, and
resulted unwanted (side) effects if something even barely complex were
entered. And this isn't the problem of Wikipedia: google docs, which
is one of the most advanced web-based gui systems I guess have plenty
of usability problems, which only can be fixed by messing with the
Source. And many core people want to mess with the source.

So, adding a newbie layer is okay as long as you don't mess up the
work of the non-newbies.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [WikiEN-l] MediaWiki is getting a new programming language

2009-07-17 Thread Peter Gervai
On Fri, Jul 17, 2009 at 12:06, Gerard Meijssengerard.meijs...@gmail.com wrote:

 Well this strength is not that great when people like myself who has commit
 right on SVN does not want to touch templates with a barge pole if I can
 help it. Wikipedia is supposed to be this thing everybody can edit.

I think you misprioritise the whole thing.

Consider it a feature, not a base functionality. Most installations do
not use 10% of the possible features available, due to lack of
knowledge, time, bravery or else. Writing templates with code is a
rare art by my observation, most of the larger MW installations never
have used it in the first hand.

You would like to remove TeX (math) input because the language is complex?

And since it'd be an extension I guess, it would imply that you want
to forbid(?) creating complex extensions? That's unrealistic. Geeks
need this functionality, so ungeeks may or may not care about that, it
doesn't really matter. If it's easier to understand than not: that's a
plus.

By the way I wouldn't touch PHP with a teen feet pole and rubber
gloves, but I'm fine with current template syntax. We're individuals
with our own preferences. ;-)

grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [WikiEN-l] MediaWiki is getting a new programming language

2009-07-08 Thread Peter Gervai
On Wed, Jul 8, 2009 at 10:16, Gerard Meijssengerard.meijs...@gmail.com wrote:
 The argument that a language should be readable and easy to learn is REALLY
 relevant and powerful. A language that is only good for geeks is detrimental
 to the use of MediaWiki. Our current templates and template syntax are
 horrible. Wikipedia is as a consequence hardly editable by everyone.

Mortals _use_ the templates, not _create_ them. Geeks create templates
for mortals.

Current syntax is indeed horrible, but complete readibility is not the
main issue I'd say. Security, speed and flexibility should be, along
the ease of implementation.

Peter

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikimedia, wikipedia and ipv6

2009-06-12 Thread Peter Gervai
On Fri, Jun 12, 2009 at 13:55, Aryeh
Gregorsimetrical+wikil...@gmail.com wrote:
 This might be useful, although most of the info is probably outdated:
 http://wikitech.wikimedia.org/view/Special:Search?search=ipv6go=Go

Yep, including the dead labs link.

But it mentioned LVS [ipvs], dunno whether we use it or not, but it
supports ipv6 either. ;-)

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Unbreaking statistics

2009-06-05 Thread Peter Gervai
Hello,

I see I've created quite a stir around, but so far nothing really
useful popped up. :-(

But I see that one from Neil:
 Yes, modifying the http://stats.grok.se/ systems looks like the way to go.

For me it doesn't really seem to be, since it seems to be using an
extremely dumbed down version of input, which only contains page views
and [unreliable] byte counters. Most probably it would require large
rewrites, and a magical new data source.

 What do people actually want to see from the traffic data? Do they want
 referrers, anonymized user trails, or what?

Are you old enough to remember stats.wikipedia.org? As far as I
remember originally it ran webalizer, then something else, then
nothing. If you check a webalizer stat you'll see what's in it. We are
using, or we used until our nice fellow editors broke it, awstats,
which basically provides the same with more caching.

Most used and useful stats are page views (daily and hourly stats are
pretty useful too), referrers, visitor domain and provider stats, os
and browser stats, screen resolution stats, bot activity stats,
visitor duration and depth, among probably others.

At a brief glance I could replicate the grok.se stats easily since it
seems to work out of http://dammit.lt/wikistats/, but it's completely
useless for anything beyond page hit count.

Is there a possibility to write a code which process raw squid data?
Who do I have to bribe? :-/

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l