Re: [Wikitech-l] 100% open source stack (was Re: Bugzilla Vs other trackers.)
What was wrong with LVM snapshots? Performance? in zfs every write is 'copy on write', so snapshots have 'zero' cost, and multiple snapshots can use same data. in LVM every snapshot is standalone and has all the information it needs. also LVM doesn't have snapshot-based replication, and DRBD+ wasn't opensource/free at that time either ;- Though reasons of OpenSolaris vs Solaris are different. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
Hello, Is the bandwidth used really a big problem? Bandwidth is pretty cheap these days, and given Wikipedia's total draw, I suspect the occasional dump download isn't much of a problem. I am not sure about the cost of the bandwidth, but the wikipedia image dumps are no longer available on the wikipedia dump anyway. I am guessing they were removed partly because of the bandwidth cost, or else image licensing issues perhaps. from: http://en.wikipedia.org/wiki/Wikipedia_database#Images_and_uploaded_files Currently Wikipedia does not allow or provide facilities to download all images. As of 17 May 2007 (2007 -05-17)[update], Wikipedia disabled or neglected all viable bulk downloads of images including torrent trackers. Therefore, there is no way to download image dumps other than scraping Wikipedia pages up or using Wikix, which converts a database dump into a series of scripts to fetch the images. Unlike most article text, images are not necessarily licensed under the GFDL CC-BY-SA-3.0. They may be under one of many free licenses, in the public domain, believed to be fair use, or even copyright infringements (which should be deleted). In particular, use of fair use images outside the context of Wikipedia or similar works may be illegal. Images under most licenses require a credit, and possibly other attached copyright information. This information is included in image description pages, which are part of the text dumps available from download.wikimedia.org. In conclusion, download these images at your own risk (Legal) Bittorrent's real strength is when a lot of people want to download the same thing at once. E.g., when a new Ubuntu release comes out. Since Bittorrent requires all downloaders to be uploaders, it turns the flood of users into a benefit. But unless somebody has stats otherwise, I'd guess that isn't the problem here. Bittorrent is simply a more efficient method to distribute files, especially if the much larger wikipedia image files were made available again. The last dump from english wikipedia including images is over 200GB but is understandably not available for download. Even if there are only 10 people per month who download these large files, bittorrent should be able to reduce the bandwidth cost to wikipedia significantly. Also I think that having bittorrent setup for this would cost wikipedia a small amount, and may save money in the long run, as well as encourage people to experiment with offline encyclopedia usage etc. To make people have to crawl wikipedia with Wikix if they want to download the images is a bad solution, as it means that the images are downloaded inefficiently. Also one wikix user reported that his download connection was cutoff by a wikipedia admin for remote downloading. Unless there are legal reasons for not allowing images to be downloaded, I think the wikipedia image files should be made available for efficient download again. However since wikix can theoretically be used to download the images, I think it would also be legal to allow the image dump to be downloaded as well, thoughts? cheers, Jamie William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Vs other trackers.
Roan Kattouw wrote: 2010/1/7 Trevor Parscal tpars...@wikimedia.org: Hmmm... Not being able to distinguish the difference between a bug tracker and a wiki based on the skins being similar is a point of view I have a hard time understanding. Having read quite a few bug reports written in wikitext (which mostly doesn't work in Bugzilla, except for [[links]]), I would encourage a clearer distinction between the wikis and the bug tracker. I don't want to give people the impression that what they're reporting bugs on is really a quirky wiki variant: the bug tracker not only uses different syntax, but also has different policies, procedures and protocols. It occurs to me that one option would be going the other way: the CodeReview extension already seems to have about 50% of the features a basic but functional bug tracker would need, including a couple of nice ones that our Bugzilla currently lacks (like, you know, comment preview, ability to use wiki markup and, well, code review). Yes, turning it into a full-featured issue tracker and project management tool would take some substantial work, but then, switching to a new project management tool and customizing it to fit our needs isn't quite a 15 minute job either. Just something to consider... :-) -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 4:31 PM, Jamie Morken jmor...@shaw.ca wrote: Bittorrent is simply a more efficient method to distribute files, especially if the much larger wikipedia image files were made available again. The last dump from english wikipedia including images is over 200GB but is understandably not available for download. Even if there are only 10 people per month who download these large files, bittorrent should be able to reduce the bandwidth cost to wikipedia significantly. Also I think that having bittorrent setup for this would cost wikipedia a small amount, and may save money in the long run, as well as encourage people to experiment with offline encyclopedia usage etc. To make people have to crawl wikipedia with Wikix if they want to download the images is a bad solution, as it means that the images are downloaded inefficiently. Also one wikix user reported that his download connection was cutoff by a wikipedia admin for remote downloading. The problem with BitTorrent is that it is unsuitable for rapidly changing data sets, such as images. If you want to add a single file to the torrent, the entire torrent hash changes, meaning that you end up with separate peer pools for every different data set, although they mostly contain the same files. That said, it could of course be benificial for an initial dump download and is better than the current situation where there is nothing available at all. Bryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 10:31 AM, Jamie Morken jmor...@shaw.ca wrote: I am not sure about the cost of the bandwidth, but the wikipedia image dumps are no longer available on the wikipedia dump anyway. I am guessing they were removed partly because of the bandwidth cost, or else image licensing issues perhaps. I think we just don't have infrastructure set up to dump images. I'm very sure bandwidth is not an issue -- the number of people with a terabyte (or is it more?) handy that they want to download a Wikipedia image dump to will be vanishingly small compared to normal users. Licensing wouldn't be an issue for Commons, at least, as long as it's easy to link the images up to their license pages. (I imagine it would technically violate some licenses, but no one would probably worry about it.) Bittorrent is simply a more efficient method to distribute files, especially if the much larger wikipedia image files were made available again. The last dump from english wikipedia including images is over 200GB but is understandably not available for download. Even if there are only 10 people per month who download these large files, bittorrent should be able to reduce the bandwidth cost to wikipedia significantly. Wikipedia uses an average of multiple gigabits per second of bandwidth, as I recall. One gigabit per second adds up to about 10.5 terabytes per day, so say 300 terabytes per month. I'm pretty sure the average figure is more like five or ten Gbps than one, so let's say a petabyte a month at least Ten people per month downloading an extra terabyte is not a big issue. And I really doubt we'd see that many people downloading a full image dump every month. The sensible bandwidth-saving way to do it would be to set up an rsync daemon on the image servers, and let people use that. Then you could get an old copy of the files from anywhere (including Bittorrent, if you like) and only have to download the changes. Plus, you could get up-to-the-minute copies if you like, although probably some throttling should be put into place to stop dozens of people from all running rsync in a loop to make sure they have the absolute latest version. I believe rsync 2 doesn't handle such huge numbers of files acceptably, but I heard rsync 3 is supposed to be much better. That sounds like a better direction to look in than Bittorrent -- nobody's going to want to redownload the same files constantly to get an up-to-date set. Unless there are legal reasons for not allowing images to be downloaded, I think the wikipedia image files should be made available for efficient download again. I'm pretty sure the reason there's no image dump is purely because not enough resources have been devoted to getting it working acceptably. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 10:56 AM, Aryeh Gregor simetrical+wikil...@gmail.com wrote: On Fri, Jan 8, 2010 at 10:31 AM, Jamie Morken jmor...@shaw.ca wrote: I am not sure about the cost of the bandwidth, but the wikipedia image dumps are no longer available on the wikipedia dump anyway. I am guessing they were removed partly because of the bandwidth cost, or else image licensing issues perhaps. I think we just don't have infrastructure set up to dump images. I'm very sure bandwidth is not an issue -- the number of people with a Correct. The space wasn't available for the required intermediate cop(y|ies). terabyte (or is it more?) handy that they want to download a Wikipedia image dump to will be vanishingly small compared to normal users. s/terabyte/several terabytes/ My copy is not up to date, but it's not smaller than 4. Licensing wouldn't be an issue for Commons, at least, as long as it's easy to link the images up to their license pages. (I imagine it would technically violate some licenses, but no one would probably worry about it.) We also dump the licensing information. If we can lawfully put the images on website then we can also distribute them in dump form. There is and can be no licensing problem. Wikipedia uses an average of multiple gigabits per second of bandwidth, as I recall. http://www.nedworks.org/~mark/reqstats/trafficstats-daily.png Though only this part is paid for: http://www.nedworks.org/~mark/reqstats/transitstats-daily.png The rest is peering, etc. which is only paid for in the form of equipment, port fees, and operational costs. The sensible bandwidth-saving way to do it would be to set up an rsync daemon on the image servers, and let people use that. This was how I maintained a running mirror for a considerable time. Unfortunately the process broke when WMF ran out of space and needed to switch servers. On Fri, Jan 8, 2010 at 10:31 AM, Jamie Morken jmor...@shaw.ca wrote: Bittorrent is simply a more efficient method to distribute files, No. In a very real absolute sense bittorrent is considerably less efficient than other means. Bittorrent moves more of the outbound traffic to the edges of the network where the real cost per gbit/sec is much greater than at major datacenters, because a megabit on a low speed link is more costly than a megabit on a high speed link and a megabit on 1 mile of fiber is more expensive than a megabit on 10 feet of fiber. More over, bittorrent is topology unaware so the path length tends to approach the internet average mean path length. Datacenters tend to be more centrally located topology wise, and topology aware distribution is easily applied to centralized stores. (E.g. WMF satisfies requests from Europe in europe, though not for the dump downloads as there simply isn't enough traffic to justify it) Bittorrent also is a more complicated, higher overhead service which requires more memory and more disk IO than traditional transfer mechanisms. There are certainly cases where bittorrent is valuable, such as the flash mob case of a new OS release. This really isn't one of those cases. On Thu, Jan 7, 2010 at 11:52 AM, William Pietri will...@scissor.com wrote: On 01/07/2010 01:40 AM, Jamie Morken wrote: I have a suggestion for wikipedia!! I think that the database dumps including the image files should be made available by a wikipedia bittorrent tracker so that people would be able to download the wikipedia backups including the images (which currently they can't do) and also so that wikipedia's bandwidth costs would be reduced. [...] Is the bandwidth used really a big problem? Bandwidth is pretty cheap these days, and given Wikipedia's total draw, I suspect the occasional dump download isn't much of a problem. Bittorrent's real strength is when a lot of people want to download the same thing at once. E.g., when a new Ubuntu release comes out. Since Bittorrent requires all downloaders to be uploaders, it turns the flood of users into a benefit. But unless somebody has stats otherwise, I'd guess that isn't the problem here. We tried BT for the commons poty archive once while I was watching and we never had a downloader stay connected long enough to help another downloader... and that was only 500mb, much easier to seed. BT also makes the server costs a lot higher: it has more cpu/memory overhead, and creates a lot of random disk IO. For low volume large files it's often not much of a win. I haven't seen the numbers for a long time, but when I last looked download.wikimedia.org was producing fairly little traffic... and much of what it was producing was outside of the peak busy hour for the sites. Since the transit is paid for on the 95th percentile and the WMF still has a decent day/night swing out of peak traffic is effectively free. The bandwidth is nothing to worry about. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org
[Wikitech-l] CSS/javascript injection for AJAX requests
I noticed today that livepreview does not pick up the dynamically-generated CSS from the SyntaxHighlight_Geshi extension. The same problem occurs in liquidthreads: when you add a comment with a Geshi call in it, the CSS will not be picked up when the comment is initially saved. The first full reload of the page will pick up the css correctly neither case. After some investigation, this is really an issue in core and will apply to any extension that needs to add CSS and/or javascript to the output HTML. To fix the bugs with livepreview, we would need some mechanism where AJAX calls receive not only new HTML, but also new CSS and/or javascript, and can add that CSS and javascript to the current page without a reload. Adding the CSS and javascript dynamically may be tricky from a compatibility standpoint, but having library functions in our site javascript would help with that. I have not investigated the cause of the problem in liquidthreads. The code in EditPage.php shows scars from similar problems, in a commented-out call to send a list of categories back to an AJAX preview request. - Carl ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Boing Boing applauds stats.grok.se!
http://www.boingboing.net/2010/01/07/wikibumps.html - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Foundation-l] Boing Boing applauds stats.grok.se!
On 01/08/2010 09:02 AM, David Gerard wrote: http://www.boingboing.net/2010/01/07/wikibumps.html And the poster, who is a Boing Boing guest editor, is one of our own, an English Wikipedia contributor since 2004: http://en.wikipedia.org/wiki/User:Jokestress William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Foundation-l] Boing Boing applauds stats.grok.se!
On Jan 8, 2010, at 7:02 PM, David Gerard wrote: http://www.boingboing.net/2010/01/07/wikibumps.html Currently we're in talks with WM-DE, so they will provision some storage for long-term archives of raw data, and we will probably add image view statistics then. Good stuff, right? Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
William Pietri wrote: On 01/07/2010 01:40 AM, Jamie Morken wrote: I have a suggestion for wikipedia!! I think that the database dumps including the image files should be made available by a wikipedia bittorrent tracker so that people would be able to download the wikipedia backups including the images (which currently they can't do) and also so that wikipedia's bandwidth costs would be reduced. [...] Is the bandwidth used really a big problem? Bandwidth is pretty cheap these days, and given Wikipedia's total draw, I suspect the occasional dump download isn't much of a problem. No, bandwidth is not really the problem here. I think the core issue is to have bulk access to images. There have been a number of these requests in the past and after talking back and forth, it has usually been the case that a smaller subset of the data works just as well. A good example of this was the Deutsche Fotokek archive made late last year. http://download.wikipedia.org/images/Deutsche_Fotothek.tar ( 11GB ) This provided an easily retrievable high quality subset of our image data which researchers could use. Now if we were to snapshot image data and store them for a particular project the amount of duplicate image data would become significant. That's because we re-use a ton of image data between projects and rightfully so. If instead we package all of commons into a tarball then we get roughly 6T's of image data which after numerous conversation has been a bit more then most people want to process. So what does everyone think of going down the collections route? If we provide enough different and up to date ones then we could easily give people a large but manageable amount of data to work with. If there is a page already for this then please feel free to point me to it otherwise I'll create one. --tomasz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 8:24 AM, Gregory Maxwell gmaxw...@gmail.com wrote: s/terabyte/several terabytes/ My copy is not up to date, but it's not smaller than 4. Top most versions of Commons files are about 4.9 TB, files on enwiki but not Commons add another 200 GB or so. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Vs other trackers.
On Thu, Jan 7, 2010 at 8:39 AM, Peter Gervai grin...@gmail.com wrote: .. Wouldn't be nice. First, it's an attitude thing: we want (and have to) promote open stuff. Second, it isn't nice to show something to the users they cannot use themselves. It's kind of against or basic principle of you can do what we do, you're free to do it, we just do it better :-) It will be a good idea to pass the memo to the guys that design the notability rules. http://ioquake3.org/2009/02/20/ioquake3-entry-deleted-from-wikipedia/ Since most (all?) opensource proyects are webonly, and don't get in the press, are on some obscure area of the web where something can be wildly popular for these in-the-know, and invisible for these that edit and delete articles. I mean, I can write a bot to nominate *all* opensource projects articles on wikipedia for speedy deletion, and few ones (maybe 6) will survive that. http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Ioquake3 Keep no matter how loud people and guidlines scream for reliable sources, many, many people use it and work on it and that makes it notable. If the press is not able to reliably represent this reality it's not a fault of the project and reality is a higher standard than reliable press. What do you need press for an Open Source project? Just looking at the SVN log proves more than any article could ever do. -- ioquake3 maintainer for the FreeBSD project -- -- ℱin del ℳensaje. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Vs other trackers.
On 08.01.2010, 22:42 Tei wrote: It will be a good idea to pass the memo to the guys that design the notability rules. http://ioquake3.org/2009/02/20/ioquake3-entry-deleted-from-wikipedia/ Since most (all?) opensource proyects are webonly, and don't get in the press, are on some obscure area of the web where something can be wildly popular for these in-the-know, and invisible for these that edit and delete articles. I mean, I can write a bot to nominate *all* opensource projects articles on wikipedia for speedy deletion, and few ones (maybe 6) will survive that. offtopic severity=Will not engage in further flamewar on-list FFS, how can one maintain an article without reliable sources? What such an article will look like? Enough article-count-stacking, emphasis on quality, even if that means systemic bias. Wikipedia is not a registry of open-source projects. And those projects that an average user might search for tend to have some sources, guess why? As of counter examples of fancruft, there's one 100% recipe: remove all in-universe crap and slap {{db-empty}} if there's nothing left. /offtopic -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Vs other trackers.
On Fri, Jan 8, 2010 at 8:42 PM, Tei oscar.vi...@gmail.com wrote: On Thu, Jan 7, 2010 at 8:39 AM, Peter Gervai grin...@gmail.com wrote: .. Wouldn't be nice. First, it's an attitude thing: we want (and have to) promote open stuff. Second, it isn't nice to show something to the users they cannot use themselves. It's kind of against or basic principle of you can do what we do, you're free to do it, we just do it better :-) It will be a good idea to pass the memo to the guys that design the notability rules. Right. Notability guidelines do not apply to the Wikimedia Servers, MediaWiki software or on which kind of bug tracker we are going to use, so please take complaining about that somewhere else. Bryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Vs other trackers.
On Fri, Jan 8, 2010 at 2:42 PM, Tei oscar.vi...@gmail.com wrote: It will be a good idea to pass the memo to the guys that design the notability rules. http://ioquake3.org/2009/02/20/ioquake3-entry-deleted-from-wikipedia/ Notability is decided by each wiki individually. The policies of the English Wikipedia are irrelevant to this list, which is about Wikimedia server administration and MediaWiki development. The correct list for this sort of comment would be wikien-l, or possibly foundation-l. Devs/sysadmins can't override wiki policies on things like notability, so there's no point in telling wikitech-l. Thanks. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
I think having access to them on Commons repository is much easier to handle. A subset should be good enough. Having 11 TB of images needs huge research capabilities in order to handle all of them and work with all of them. Maybe a special API or advanced API functions would allow people enough access and at the same time save the bandwidth and the hassle to handle this behemoth collection. bilal -- Verily, with hardship comes ease. On Fri, Jan 8, 2010 at 1:57 PM, Tomasz Finc tf...@wikimedia.org wrote: William Pietri wrote: On 01/07/2010 01:40 AM, Jamie Morken wrote: I have a suggestion for wikipedia!! I think that the database dumps including the image files should be made available by a wikipedia bittorrent tracker so that people would be able to download the wikipedia backups including the images (which currently they can't do) and also so that wikipedia's bandwidth costs would be reduced. [...] Is the bandwidth used really a big problem? Bandwidth is pretty cheap these days, and given Wikipedia's total draw, I suspect the occasional dump download isn't much of a problem. No, bandwidth is not really the problem here. I think the core issue is to have bulk access to images. There have been a number of these requests in the past and after talking back and forth, it has usually been the case that a smaller subset of the data works just as well. A good example of this was the Deutsche Fotokek archive made late last year. http://download.wikipedia.org/images/Deutsche_Fotothek.tar ( 11GB ) This provided an easily retrievable high quality subset of our image data which researchers could use. Now if we were to snapshot image data and store them for a particular project the amount of duplicate image data would become significant. That's because we re-use a ton of image data between projects and rightfully so. If instead we package all of commons into a tarball then we get roughly 6T's of image data which after numerous conversation has been a bit more then most people want to process. So what does everyone think of going down the collections route? If we provide enough different and up to date ones then we could easily give people a large but manageable amount of data to work with. If there is a page already for this then please feel free to point me to it otherwise I'll create one. --tomasz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Boing Boing applauds stats.grok.se!
David Gerard wrote: http://www.boingboing.net/2010/01/07/wikibumps.html On sv.wikipedia there is a gadget that creates a stats tab on each page. That's very useful. Why don't more languages of Wikipedia have that gadget installed? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 3:28 PM, Bilal Abdul Kader bila...@gmail.com wrote: I think having access to them on Commons repository is much easier to handle. A subset should be good enough. Having 11 TB of images needs huge research capabilities in order to handle all of them and work with all of them. Maybe a special API or advanced API functions would allow people enough access and at the same time save the bandwidth and the hassle to handle this behemoth collection. Well, if there were an rsyncd you could just fetch the ones you wanted arbitrarily. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
Well, if there were an rsyncd you could just fetch the ones you wanted arbitrarily. rsyncd is fail for large file mass delivery, and it is fail when exposed to masses. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
Can someone articulate what the use case is? Is there someone out there who could use a 5 TB image archive but is disappointed it doesn't exist? Seems rather implausible. If not, then I assume that everyone is really after only some subset of the files. If that's the case we should try to figure out what kinds of subsets and the best way to handle them. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Boing Boing applauds stats.grok.se!
On Fri, Jan 8, 2010 at 12:38 PM, Lars Aronsson l...@aronsson.se wrote: David Gerard wrote: http://www.boingboing.net/2010/01/07/wikibumps.html On sv.wikipedia there is a gadget that creates a stats tab on each page. That's very useful. Why don't more languages of Wikipedia have that gadget installed? Local admins control the installation of gadgets. On Enwiki the process is at: http://en.wikipedia.org/wiki/Wikipedia:Gadget -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Unified gadgets (was: stats.grok.se)
Robert Rohde wrote: Local admins control the installation of gadgets. On Enwiki the process is at: http://en.wikipedia.org/wiki/Wikipedia:Gadget Exactly! This is poor design. I have an account (through SUL) on the Ukrainian Wikipedia because I sometimes add interwiki links there. I want the same gadgets there, but I don't speak Ukrainian and I can't go around bothering local admins on every language with this. Gadgets should follow the user, just like the account name and password do. There must be a better way than the current one. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Unified gadgets (was: stats.grok.se)
On Fri, Jan 8, 2010 at 4:14 PM, Lars Aronsson l...@aronsson.se wrote: Exactly! This is poor design. I have an account (through SUL) on the Ukrainian Wikipedia because I sometimes add interwiki links there. I want the same gadgets there, but I don't speak Ukrainian and I can't go around bothering local admins on every language with this. Gadgets should follow the user, just like the account name and password do. There must be a better way than the current one. We should also make it possible to have global gadgets controlled on Meta-Wiki. This would be especially useful for hiding the Fundraising banner. ;-) -- Casey Brown Cbrown1023 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CSS/javascript injection for AJAX requests
The styles and js are already available in the parser output in -mHeadItems. Should be trivial to expose them through the API via action=parse. So I've put this on bugzilla, see https://bugzilla.wikimedia.org/show_bug.cgi?id=22061 P.Copp On Fri, Jan 8, 2010 at 5:42 PM, Carl (CBM) cbm.wikipe...@gmail.com wrote: I noticed today that livepreview does not pick up the dynamically-generated CSS from the SyntaxHighlight_Geshi extension. The same problem occurs in liquidthreads: when you add a comment with a Geshi call in it, the CSS will not be picked up when the comment is initially saved. The first full reload of the page will pick up the css correctly neither case. After some investigation, this is really an issue in core and will apply to any extension that needs to add CSS and/or javascript to the output HTML. To fix the bugs with livepreview, we would need some mechanism where AJAX calls receive not only new HTML, but also new CSS and/or javascript, and can add that CSS and javascript to the current page without a reload. Adding the CSS and javascript dynamically may be tricky from a compatibility standpoint, but having library functions in our site javascript would help with that. I have not investigated the cause of the problem in liquidthreads. The code in EditPage.php shows scars from similar problems, in a commented-out call to send a list of categories back to an AJAX preview request. - Carl ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Vs other trackers.
Tei wrote: It will be a good idea to pass the memo to the guys that design the notability rules. http://ioquake3.org/2009/02/20/ioquake3-entry-deleted-from-wikipedia/ Since most (all?) opensource proyects are webonly, and don't get in the press, are on some obscure area of the web where something can be wildly popular for these in-the-know, and invisible for these that edit and delete articles. I mean, I can write a bot to nominate *all* opensource projects articles on wikipedia for speedy deletion, and few ones (maybe 6) will survive that. *Many* opensource projects are relevant, to cite a few: Apache, PHP, Python, Perl, Ruby, Postgresql, subversion, mercurial, git, bazaar... Those are more than 6... :) They are technologies widely known, there are books written about them... As opposed, this is the first time I hear about ioquake3. It may be relevant, it may be not. Being in the web and free is not enough for warranting notability. Even though script kiddies making its Linux ditro don't like it :) http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Ioquake3 Keep no matter how loud people and guidlines scream for reliable sources, many, many people use it and work on it and that makes it notable. If the press is not able to reliably represent this reality it's not a fault of the project and reality is a higher standard than reliable press. What do you need press for an Open Source project? Just looking at the SVN log proves more than any article could ever do. -- ioquake3 maintainer for the FreeBSD project If they are relevant, why bother if wikipedia doesn't acknowledge that? Suppose wikipedia didn't have an article about FreeBSD, would that make it a worse OS? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
Gregory Maxwell wrote: Er. I've maintained a non-WMF disaster recovery archive for a long time, though its no longer completely current since the rsync went away and web fetching is lossy. And the box run out of disk space. We could try until it fills again, though. A sysadmin fixing images with wrong hashes would also be nice https://bugzilla.wikimedia.org/show_bug.cgi?id=17057#c3 It saved our rear a number of times, saving thousands of images from irreparable loss. Moreover it allowed things like image hashing before we had that in the database, and it would allow perceptual lossy hash matching if I ever got around to implementing tools to access the output. IMHO the problem is not accessing it, but hashing those terabytes of images. There really are use cases. Moreover, making complete copies of the public data available as dumps to the public is a WMF board supported initiative. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] 100% open source stack (was Re: Bugzilla Vs other trackers.)
Platonides wrote: What were the reasons for replacing lighttpd with Sun Java System Web Server ? Probably the same reason that the toolserver uses Confluence instead of MediaWiki. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] 100% open source stack (was Re: Bugzilla Vs other trackers.)
On Sat, Jan 9, 2010 at 12:10 PM, Tim Starling tstarl...@wikimedia.org wrote: Platonides wrote: What were the reasons for replacing lighttpd with Sun Java System Web Server ? Probably the same reason that the toolserver uses Confluence instead of MediaWiki. It only contains one page, which points to the MediaWiki wiki. https://confluence.toolserver.org/pages/listpages-dirview.action?key=main Are there plans to make greater use of the Confluence wiki? https://wiki.toolserver.org/view/Domains#confluence.toolserver.org -- John Vandenberg ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 10:56 AM, Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com wrote: The sensible bandwidth-saving way to do it would be to set up an rsync daemon on the image servers, and let people use that. The bandwidth-saving way to do things would be to just allow mirrors to use hotlinking. Requiring a middle man to temporarily store images (many, and possibly even most of which will never even be downloaded by end users) just wastes bandwidth. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On Fri, Jan 8, 2010 at 9:06 PM, Gregory Maxwell gmaxw...@gmail.com wrote: Yea, well, you can't easily eliminate all the internal points of failure. someone with root loses control of their access and someone nasty wipes everything is really hard to protect against with online systems. Isn't that what the system immutable flag is for? It's easy, as long as you're willing to put up with a bit of whining from the person with root access. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] 100% open source stack (was Re: Bugzilla Vs other trackers.)
John Vandenberg wrote: On Sat, Jan 9, 2010 at 12:10 PM, Tim Starling tstarl...@wikimedia.org wrote: Platonides wrote: What were the reasons for replacing lighttpd with Sun Java System Web Server ? Probably the same reason that the toolserver uses Confluence instead of MediaWiki. It only contains one page, which points to the MediaWiki wiki. https://confluence.toolserver.org/pages/listpages-dirview.action?key=main I count 65 pages. https://confluence.toolserver.org/pages/listpages-dirview.action?key=tech Maybe you were confused by the unfamiliar UI. Are there plans to make greater use of the Confluence wiki? Certainly not. The reason for using SJWS on ms* was the same reason the toolserver uses Confluence: River installed them both. River's contribution is very much appreciated, but he does have his own way of doing things. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l