Re: [Wikitech-l] hosting wikipedia
On Wed, Jan 28, 2009 at 8:28 AM, Tei wrote: > On Wed, Jan 28, 2009 at 1:41 AM, Aryeh Gregor > wrote: >> On Tue, Jan 27, 2009 at 7:37 PM, George Herbert >> wrote: >>> Right, but a live mirror is a very different thing than a search box link. >> >> Well, as far as I can tell, we have no idea whether the original >> poster meant either of those, or perhaps something else altogether. >> Obviously nobody minds a search box link, that's just a *link*. You >> can't stop people from linking to you. >> > > This one code don't even need to use > http://en.wiktionary.org/wiki/Special:Search > > > > > > > > > function $(name){ > return document.getElementById(name); > } > > function searchWiktionary(){ > var word = $("word").value; > $("form1").setAttribute("action","http://en.wiktionary.org/wiki/"+ > escape(word) ); > $("form1").submit(); > } > > > postadata: I know the OP talk about OpenSearch. This snip of code is something different instead. -- -- ℱin del ℳensaje. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
On Wed, Jan 28, 2009 at 1:41 AM, Aryeh Gregor wrote: > On Tue, Jan 27, 2009 at 7:37 PM, George Herbert > wrote: >> Right, but a live mirror is a very different thing than a search box link. > > Well, as far as I can tell, we have no idea whether the original > poster meant either of those, or perhaps something else altogether. > Obviously nobody minds a search box link, that's just a *link*. You > can't stop people from linking to you. > This one code don't even need to use http://en.wiktionary.org/wiki/Special:Search function $(name){ return document.getElementById(name); } function searchWiktionary(){ var word = $("word").value; $("form1").setAttribute("action","http://en.wiktionary.org/wiki/"+ escape(word) ); $("form1").submit(); } -- -- ℱin del ℳensaje. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Toolserver-l] Crawling deWP
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 1:13 AM, Daniel Kinzler wrote: > Marco Schuster schrieb: >>> Fetch them from the toolserver (there's a tool by duesentrieb for that). >>> It will catch almost all of them from the toolserver cluster, and make a >>> request to wikipedia only if needed. >> I highly doubt this is "legal" use for the toolserver, and I pretty >> much guess that 800k revisions to fetch would be a huge resource load. >> >> Thanks, Marco >> >> PS: CC-ing toolserver list. > > It's a legal use, the only problem is that the tool i wrote for is is quite > slow. You shouldn't hit it at full speed. So it might actually be better to > query the main server cluster, they can distribute the load more nicely. What is the best speed, actually? 2 requests per second? Or can I go up to 4? > One day i'll rewrite WikiProxy and everything will be better :) :) > But by then, i do hope we have revision flags in the dumps. because that would > be The Right Thing to use. Still, using the dumps would require me to get the full history dump because I only want flagged revisions and not current revisions without the flag. Marco -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) iD8DBQFJgAIpW6S2GapJUuQRAuY/AJ47eppKPbBqjz0l4HllCPolMWz9KACfRurR Lod/wkd4ZM0ee+cPTfaO7yg= =zB26 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Slow, what to look for?
http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/jobs-loop/run-jobs.c?revision=22101&view=markup&sortby=date As mentioned, it is just a sample script. For sites with just one master/slave cluster, any simple script that keeps looping to run maintenance/runJobs.php will do. -Aaron -- From: "Marco Schuster" Sent: Tuesday, January 27, 2009 6:56 PM To: "Wikimedia developers" Subject: Re: [Wikitech-l] MediaWiki Slow, what to look for? > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On Tue, Jan 27, 2009 at 6:56 PM, Jason Schulz wrote: >> Also, see >> http://www.mediawiki.org/wiki/User:Aaron_Schulz/How_to_make_MediaWiki_fast > The shell script you mention in step 2 has some stuff in it that makes > it unusable outside Wikimedia: > 1) lots of hard-coded paths > 2) what is "/usr/local/bin/run-jobs"? > > I'd put "0 0 * * * /usr/bin/php /var/www/wiki/maintenance/runJobs.php > 2>&1 > /var/log/runJobs.log" as crontab entry in your guide, as it's a > bit more compatible with non-wikimedia environments ;) > > Marco > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (MingW32) > Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) > > iD8DBQFJf59oW6S2GapJUuQRAvYCAJ4vWBAHSTHlJljfnnUSF7IpZlechQCcCY5A > Zb5SMJz146sM5HalNQuA/9k= > =Ie27 > -END PGP SIGNATURE- > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Slow, what to look for?
Dawson wrote: > Modified config file as follows: > > $wgUseDatabaseMessage = false; > $wgUseFileCache = true; > $wgMainCacheType = "CACHE_ACCEL"; This should be $wgMainCacheType = CACHE_ACCEL; (constant) not $wgMainCacheType = "CACHE_ACCEL"; (string) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
On Tue, Jan 27, 2009 at 7:37 PM, George Herbert wrote: > Right, but a live mirror is a very different thing than a search box link. Well, as far as I can tell, we have no idea whether the original poster meant either of those, or perhaps something else altogether. Obviously nobody minds a search box link, that's just a *link*. You can't stop people from linking to you. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
On Tue, Jan 27, 2009 at 3:54 PM, Aryeh Gregor > wrote: > Anyway, the reason live mirrors are prohibited is not for load > reasons. I believe it's because if a site does nothing but stick up > some ads and add no value, Wikimedia is going to demand a cut of the > profit for using its trademarks and so on. Some sites pay Wikimedia > for live mirroring. So the others, in principle, get blocked. Right, but a live mirror is a very different thing than a search box link. -- -george william herbert george.herb...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Toolserver-l] Crawling deWP
Marco Schuster schrieb: >> Fetch them from the toolserver (there's a tool by duesentrieb for that). >> It will catch almost all of them from the toolserver cluster, and make a >> request to wikipedia only if needed. > I highly doubt this is "legal" use for the toolserver, and I pretty > much guess that 800k revisions to fetch would be a huge resource load. > > Thanks, Marco > > PS: CC-ing toolserver list. It's a legal use, the only problem is that the tool i wrote for is is quite slow. You shouldn't hit it at full speed. So it might actually be better to query the main server cluster, they can distribute the load more nicely. One day i'll rewrite WikiProxy and everything will be better :) But by then, i do hope we have revision flags in the dumps. because that would be The Right Thing to use. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Crawling deWP
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:53 AM, Platonides wrote: > Marco Schuster wrote: >> Hi all, >> >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. >> For this, I obviously need to spider Wikipedia. >> What are the limits (rate!) here, what UA should I use and what >> caveats do I have to take care of? >> >> Thanks, >> Marco >> >> PS: I already have a revisions list, created with the Toolserver. I >> used the following query: "select fp_stable,fp_page_id from >> flaggedpages where fp_reviewed=1;". Is it correct this one gives me a >> list of all articles with flagged revs, fp_stable being the revid of >> the most current flagged rev for this article? > > Fetch them from the toolserver (there's a tool by duesentrieb for that). > It will catch almost all of them from the toolserver cluster, and make a > request to wikipedia only if needed. I highly doubt this is "legal" use for the toolserver, and I pretty much guess that 800k revisions to fetch would be a huge resource load. Thanks, Marco PS: CC-ing toolserver list. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) iD8DBQFJf6AjW6S2GapJUuQRAvBuAJ46G0qhk+e2axFddbHFMUqzScH4PgCeIMBL L9WWNeZaA/6vHyzSoKrGN54= =p/R+ -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Crawling deWP
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:49 AM, Rolf Lampa wrote: > Marco Schuster skrev: >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. > [...] >> flaggedpages where fp_reviewed=1;". Is it correct this one gives me a >> list of all articles with flagged revs, > > > Doesn't the xml dumps contain the flag for flagged revs? The xml dumps are nothing for me, way too much overhead (especially, they are old, and I want to use single files, it's easier to process these than one hge xml file). And they don't contain flagged revisions flags :( Marco -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) iD8DBQFJf5/cW6S2GapJUuQRAj1KAJ9feF3ElQTQbuENa2xfDoXJE5pq5QCfYtRd x8lfmVHMzmVOqtO39MCfieQ= =8YJP -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Slow, what to look for?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Jan 27, 2009 at 6:56 PM, Jason Schulz wrote: > Also, see > http://www.mediawiki.org/wiki/User:Aaron_Schulz/How_to_make_MediaWiki_fast The shell script you mention in step 2 has some stuff in it that makes it unusable outside Wikimedia: 1) lots of hard-coded paths 2) what is "/usr/local/bin/run-jobs"? I'd put "0 0 * * * /usr/bin/php /var/www/wiki/maintenance/runJobs.php 2>&1 > /var/log/runJobs.log" as crontab entry in your guide, as it's a bit more compatible with non-wikimedia environments ;) Marco -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) iD8DBQFJf59oW6S2GapJUuQRAvYCAJ4vWBAHSTHlJljfnnUSF7IpZlechQCcCY5A Zb5SMJz146sM5HalNQuA/9k= =Ie27 -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
On Tue, Jan 27, 2009 at 6:43 PM, George Herbert wrote: > Google switching to use our search would crush us, obviously. Doubtful. It wouldn't be terribly pleasant, but I doubt it would take down the site so easily. Alexa says google.com gets about ten times the traffic as wikipedia.org. If google.com/ redirected to wikipedia.org, I don't know if that would crash the site by itself. > As would AOL. Wikipedia is far bigger than AOL. That would only be a 20% or 30% spike in traffic. I'm pretty sure we could handle that. Anyway, the reason live mirrors are prohibited is not for load reasons. I believe it's because if a site does nothing but stick up some ads and add no value, Wikimedia is going to demand a cut of the profit for using its trademarks and so on. Some sites pay Wikimedia for live mirroring. So the others, in principle, get blocked. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Crawling deWP
Marco Schuster wrote: > Hi all, > > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. > For this, I obviously need to spider Wikipedia. > What are the limits (rate!) here, what UA should I use and what > caveats do I have to take care of? > > Thanks, > Marco > > PS: I already have a revisions list, created with the Toolserver. I > used the following query: "select fp_stable,fp_page_id from > flaggedpages where fp_reviewed=1;". Is it correct this one gives me a > list of all articles with flagged revs, fp_stable being the revid of > the most current flagged rev for this article? Fetch them from the toolserver (there's a tool by duesentrieb for that). It will catch almost all of them from the toolserver cluster, and make a request to wikipedia only if needed. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Crawling deWP
Rolf Lampa schrieb: > Marco Schuster skrev: >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. > [...] >> flaggedpages where fp_reviewed=1;". Is it correct this one gives me a >> list of all articles with flagged revs, > > > Doesn't the xml dumps contain the flag for flagged revs? > They don't. And that's very sad. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Crawling deWP
Marco Schuster skrev: > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. [...] > flaggedpages where fp_reviewed=1;". Is it correct this one gives me a > list of all articles with flagged revs, Doesn't the xml dumps contain the flag for flagged revs? // Rolf Lampa ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008
On 1/27/09 2:55 PM, Robert Rohde wrote: > On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber wrote: >> On 1/27/09 2:35 PM, Thomas Dalton wrote: >>> The way I see it, what we need is to get a really powerful server >> Nope, it's a software architecture issue. We'll restart it with the new >> arch when it's ready to go. > > I don't know what your timetable is, but what about doing something to > address the other aspects of the dump (logs, stubs, etc.) that are in > limbo while full history chugs along. All the other enwiki files are > now 3 months old and that is already enough to inconvenience some > people. > > The simplest solution is just to kill the current dump job if you have > faith that a new architecture can be put in place in less than a year. We'll probably do that. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
On Tue, Jan 27, 2009 at 11:29 AM, Steve Summit wrote: > Jeff Ferland wrote: > > You'll need a quite impressive machine to host even just the current > > revisions of the wiki. Expect to expend 10s to even hundreds of > > gigabytes on the database alone for Wikipedia using only the current > > versions. > > No, no, no. You're looking at it all wrong. That's the sucker's > way of doing it. > > If you're smart, you put up a simple page with a text box labeled > "Wikipedia search", and whenever someone types a query into > the box and submits it, you ship the query over to the Wikimedia > servers, and then slurp back the response, and display it back > to the original submitter. That way only Wikimedia has to worry > about all those pesky gigabyte-level database hosting requirements, > while you get all the glory. > > This appears to be what the questioner is asking about. > Let's AGF a bit... Even if someone with a not particularly Wikipedia goal in life links to one of our searches from their page, all the resultant search result links are back into Wikipedia. If people have a question about something, and want to look it up, does it really matter if they go to Wikipedia's front page and click "search" versus doing so in another context? We're providing an information resource - other sites can and often do link to our articles (quite appropriately). Why not link to our search? The search link should in fairness tell people what they're getting, sure, but that's more of a website-to-end-user disclosure problem than a problem for us. Google switching to use our search would crush us, obviously. As would AOL. But J. Random site? Seems like an ok thing, to me. -- -george william herbert george.herb...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Crawling deWP
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. For this, I obviously need to spider Wikipedia. What are the limits (rate!) here, what UA should I use and what caveats do I have to take care of? Thanks, Marco PS: I already have a revisions list, created with the Toolserver. I used the following query: "select fp_stable,fp_page_id from flaggedpages where fp_reviewed=1;". Is it correct this one gives me a list of all articles with flagged revs, fp_stable being the revid of the most current flagged rev for this article? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) iD8DBQFJf5wcW6S2GapJUuQRAl8NAJ0Xs+ImyTqmoX2Vtj6k6PK9ntlS5wCeJjsl M5kMETB3URYni5TilIOt8Fs= =j7Og -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008
On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber wrote: > On 1/27/09 2:35 PM, Thomas Dalton wrote: >> The way I see it, what we need is to get a really powerful server > > Nope, it's a software architecture issue. We'll restart it with the new > arch when it's ready to go. I don't know what your timetable is, but what about doing something to address the other aspects of the dump (logs, stubs, etc.) that are in limbo while full history chugs along. All the other enwiki files are now 3 months old and that is already enough to inconvenience some people. The simplest solution is just to kill the current dump job if you have faith that a new architecture can be put in place in less than a year. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008
On 1/27/09 2:35 PM, Thomas Dalton wrote: > The way I see it, what we need is to get a really powerful server Nope, it's a software architecture issue. We'll restart it with the new arch when it's ready to go. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008
> Whether we want to let the current process continue to try and finish > or not, I would seriously suggest someone look into redumping the rest > of the enwiki files (i.e. logs, current pages, etc.). I am also among > the people that care about having reasonably fresh dumps and it really > is a problem that the other dumps (e.g. stubs-meta-history) are frozen > while we wait to see if the full history dump can run to completion. Even if we do let it finish, I'm not sure a dump of what Wikipedia was like 13 months ago is much use... The way I see it, what we need is to get a really powerful server to do the dump just once at a reasonable speed and then we'll have a previous dump to build on so future ones would be more reasonable. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008
The problem, as I understand it (and Brion may come by to correct me) is essentially that the current dump process is designed in a way that can't be sustained given the size of enwiki. It really needs to be re-engineered, which means that developer time is needed to create a new approach to dumping. The main target for improvement is almost certainly parallelizing the process so that wouldn't be a single monolithic dump process, but rather a lot of little processes working in parallel. That would also ensure that if a single process gets stuck and dies, the entire dump doesn't need to start over. By way of observation, the dewiki's full history dumps in 26 hours with 96% prefetched (i.e. loaded from previous dumps). That suggests that even starting from scratch (prefetch = 0%) it should dump in ~25 days under the current process. enwiki is perhaps 3-6 times larger than dewiki depending on how you do the accounting, which implies dumping the whole thing from scratch would take ~5 months if the process scaled linearly. Of course it doesn't scale linearly, and we end up with a prediction for completion that is currently 10 months away (which amounts to a 13 month total execution). And of course, if there is any serious error in the next ten months the entire process could die with no result. Whether we want to let the current process continue to try and finish or not, I would seriously suggest someone look into redumping the rest of the enwiki files (i.e. logs, current pages, etc.). I am also among the people that care about having reasonably fresh dumps and it really is a problem that the other dumps (e.g. stubs-meta-history) are frozen while we wait to see if the full history dump can run to completion. -Robert Rohde On Tue, Jan 27, 2009 at 11:24 AM, Christian Storm wrote: >>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: >>> The current enwiki database dump >>> (http://download.wikimedia.org/enwiki/20081008/ >>> ) has been crawling along since 10/15/2008. >> The current dump system is not sustainable on very large wikis and >> is being replaced. You'll hear about it when we have the new one in >> place. :) >> -- brion > > Following up on this thread: > http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html > > Brion, > > Can you offer any general timeline estimates (weeks, months, 1/2 > year)? Are there any alternatives to retrieving the article data > beyond directly crawling > the site? I know this is verboten but we are in dire need of > retrieving this data and don't know of any alternatives. The current > estimate of end of year is > too long for us to wait. Unfortunately, wikipedia is a favored source > for students to plagiarize from which makes out of date content a real > issue. > > Is there any way to help this process along? We can donate disk > drives, developer time, ...? There is another possibility > that we could offer but I would need to talk with someone at the > wikimedia foundation offline. Is there anyone I could > contact? > > Thanks for any information and/or direction you can give. > > Christian > > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Make upload headings changeable
Chad hett schreven: > Should be done with a wiki's content language as of r46372. > > -Chad Thanks! That's already a big improvement, but why content language? As I pointed out in response to your question, it need's to be user language on Meta, Incubator, Wikispecies, Beta Wikiversity, old Wikisource, and all the multilingual wikis of third party users. It's not actually necessary on non-multilingual wikis, but it does no harm either. So why content language? This could be solved with a setting in LocalSettings.php "isMultilingual", but that's another affair and as long as that does not exist, we should use user language. Marcus Buck ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Make upload headings changeable
On Mon, Jan 26, 2009 at 12:44 PM, Ilmari Karonen wrote: > Chad wrote: > > I was going to provide a specific parameter for it. That entire key sucks > > though anyway, I should probably ditch the md5()'d URL in favor of using > > the actual name. Fwiw: I've got a patch working, but I'm not quite ready > > to commit it yet. While we're at it, are we sure we want to use $wgLang > and > > not $wgContLang? Image description pages are "content", not a part of > > the interface. That being said, I would think it would be best to fetch > the > > information using the wiki's content language. > > Well, if you actually visit the description page on Commons, you'll see > the templates in your interface language -- that's kind of the _point_ > of the autotranslated templates. > > Then again, Commons is kind of a special case, since, being a > multilingual project, it doesn't _have_ a real content language; in a > technical sense its content language is English, but that's only because > MediaWiki requires one language to be specified as a content language > even if the actual content is multilingual. So I can see arguments > either way. > > What language is the "shareduploadwiki-desc" message shown in, anyway? > Seems to be $wgLang, which would seem to suggest that the actual > description should be shown in the interface language too, for consistency. > > -- > Ilmari Karonen > Should be done with a wiki's content language as of r46372. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008
I have a decent server that is dedicated for a Wikipedia project that depends on the fresh dumps. Can this be used anyway to speed up the process of generating the dumps? bilal On Tue, Jan 27, 2009 at 2:24 PM, Christian Storm wrote: > >> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: > >> The current enwiki database dump ( > http://download.wikimedia.org/enwiki/20081008/ > >> ) has been crawling along since 10/15/2008. > > The current dump system is not sustainable on very large wikis and > > is being replaced. You'll hear about it when we have the new one in > > place. :) > > -- brion > > Following up on this thread: > http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html > > Brion, > > Can you offer any general timeline estimates (weeks, months, 1/2 > year)? Are there any alternatives to retrieving the article data > beyond directly crawling > the site? I know this is verboten but we are in dire need of > retrieving this data and don't know of any alternatives. The current > estimate of end of year is > too long for us to wait. Unfortunately, wikipedia is a favored source > for students to plagiarize from which makes out of date content a real > issue. > > Is there any way to help this process along? We can donate disk > drives, developer time, ...? There is another possibility > that we could offer but I would need to talk with someone at the > wikimedia foundation offline. Is there anyone I could > contact? > > Thanks for any information and/or direction you can give. > > Christian > > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
Jeff Ferland wrote: > You'll need a quite impressive machine to host even just the current > revisions of the wiki. Expect to expend 10s to even hundreds of > gigabytes on the database alone for Wikipedia using only the current > versions. No, no, no. You're looking at it all wrong. That's the sucker's way of doing it. If you're smart, you put up a simple page with a text box labeled "Wikipedia search", and whenever someone types a query into the box and submits it, you ship the query over to the Wikimedia servers, and then slurp back the response, and display it back to the original submitter. That way only Wikimedia has to worry about all those pesky gigabyte-level database hosting requirements, while you get all the glory. This appears to be what the questioner is asking about. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Enwiki dump crawling since 10/15/2008
>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: >> The current enwiki database dump >> (http://download.wikimedia.org/enwiki/20081008/ >> ) has been crawling along since 10/15/2008. > The current dump system is not sustainable on very large wikis and > is being replaced. You'll hear about it when we have the new one in > place. :) > -- brion Following up on this thread: http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html Brion, Can you offer any general timeline estimates (weeks, months, 1/2 year)? Are there any alternatives to retrieving the article data beyond directly crawling the site? I know this is verboten but we are in dire need of retrieving this data and don't know of any alternatives. The current estimate of end of year is too long for us to wait. Unfortunately, wikipedia is a favored source for students to plagiarize from which makes out of date content a real issue. Is there any way to help this process along? We can donate disk drives, developer time, ...? There is another possibility that we could offer but I would need to talk with someone at the wikimedia foundation offline. Is there anyone I could contact? Thanks for any information and/or direction you can give. Christian ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
I'll try to weigh in with a bit of useful information, but it probably won't help that much. You'll need a quite impressive machine to host even just the current revisions of the wiki. Expect to expend 10s to even hundreds of gigabytes on the database alone for Wikipedia using only the current versions. There instructions for how to load the data that can be found by googling "wikipedia dump". Several others have inquired for more information about your goal, and I'm going to echo that. The mechanics of hosting this kind of data (volume, really) are highly related to the associated task. This data used for academic research would be handled differenty than for a live website, for example. Nobody likes to be told they can't do something, or get a bunch of useless responses to a request for help. Very sincerely, though, unless you find enough information from the dump instruction pages to point you on the right direction and are able to ask more specific questions, you are in over your head. Your solution at that point would be to hire somebody. Sent from my phone, Jeff On Jan 27, 2009, at 12:34 PM, Stephen Dunn wrote: > Hi Folks: > > I am a newbie so I apologize if I am asking basic questions. How > would I go about hosting wiktionary allowing search queries via the > web using opensearch. I am having trouble fining info on how to set > this up. Any assistance is greatly appreciated. > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
maybe this is what this guy need: http://en.wiktionary.org/wiki/Special:Search";> test: http://zerror.com/unorganized/wika/test.htm it don't seems wiktionary block external searchs now (via REFERRER), but maybe may change the policy on the future/change the parameters needed. On Tue, Jan 27, 2009 at 7:18 PM, Stephen Dunn wrote: > refer to reference. com website and do a search > >> yes, website. so a web page has a search box that passes the input to >> wiktionary and results are provided on a results page. an example may be >> reference..com > > How would this differ from the search box on en.wiktionary.org? What > are you actually trying to achieve? > -- -- ℱin del ℳensaje. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
refer to reference. com website and do a search - Original Message From: Thomas Dalton To: Wikimedia developers Sent: Tuesday, January 27, 2009 1:07:36 PM Subject: Re: [Wikitech-l] hosting wikipedia 2009/1/27 Stephen Dunn : > yes, website. so a web page has a search box that passes the input to > wiktionary and results are provided on a results page. an example may be > reference..com How would this differ from the search box on en.wiktionary.org? What are you actually trying to achieve? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
2009/1/27 Stephen Dunn : > yes, website. so a web page has a search box that passes the input to > wiktionary and results are provided on a results page. an example may be > reference..com How would this differ from the search box on en.wiktionary.org? What are you actually trying to achieve? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
yes, website. so a web page has a search box that passes the input to wiktionary and results are provided on a results page. an example may be reference..com - Original Message From: Thomas Dalton To: Wikimedia developers Sent: Tuesday, January 27, 2009 12:50:18 PM Subject: Re: [Wikitech-l] hosting wikipedia 2009/1/27 Stephen Dunn : > I am working on a project to host wiktionary on one web page and wikipedia on > another. So both, sorry.. You mean web *site*, surely? They are both far too big to fit on a single page. I think you need to work out precisely what it is you're trying to do before we can help you. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Slow, what to look for?
To use filecache, you need to set $wgShowIPinHeader = false; Also, see http://www.mediawiki.org/wiki/User:Aaron_Schulz/How_to_make_MediaWiki_fast -Aaron -- From: "Dawson" Sent: Tuesday, January 27, 2009 9:52 AM To: "Wikimedia developers" Subject: Re: [Wikitech-l] MediaWiki Slow, what to look for? > Modified config file as follows: > > $wgUseDatabaseMessage = false; > $wgUseFileCache = true; > $wgMainCacheType = "CACHE_ACCEL"; > > I also installed xcache and eaccelerator. The improvement in speed is > huge. > > 2009/1/27 Aryeh Gregor > >> > >> On Tue, Jan 27, 2009 at 5:31 AM, Dawson wrote: >> > Hello, I have a couple of mediawiki installations on two different >> > slices >> at >> > Slicehost, both of which run websites on the same slice with no speed >> > problems, however, the mediawiki themselves run like dogs! >> > http://wiki.medicalstudentblog.co.uk/ Any ideas what to look for or >> > ways >> to >> > optimise them? I still can't get over they need a 100mb ini_set in >> settings >> > to just load due to the messages or something. >> >> If you haven't already, you should set up an opcode cache like APC or >> XCache, and a variable cache like APC or XCache (if using one >> application server) or memcached (if using multiple application >> servers). Those are essential for decent performance. If you want >> really snappy views, at least for logged-out users, you should use >> Squid too, although that's probably overkill for a small site. It >> also might be useful to install wikidiff2 and use that for diffs. >> >> Of course, none of this works if you don't have root access. (Well, >> maybe you could get memcached working with only shell . . .) In that >> case, I'm not sure what advice to give. >> >> MediaWiki is a big, slow package, though. For large sites, it has >> scalability features that are almost certainly unparalleled in any >> other wiki software, but it's probably not optimized as much for quick >> loading on small-scale, cheap hardware. It's mainly meant for >> Wikipedia. If you want to try digging into what's taking so long, you >> can try enabling profiling: >> >> http://www.mediawiki.org/wiki/Profiling#Profiling >> >> If you find something that helps a lot, it would be helpful to mention >> it. >> >> ___ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
2009/1/27 Stephen Dunn : > I am working on a project to host wiktionary on one web page and wikipedia on > another. So both, sorry.. You mean web *site*, surely? They are both far too big to fit on a single page. I think you need to work out precisely what it is you're trying to do before we can help you. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
I am working on a project to host wiktionary on one web page and wikipedia on another. So both, sorry.. - Original Message From: Thomas Dalton To: Wikimedia developers Sent: Tuesday, January 27, 2009 12:43:49 PM Subject: Re: [Wikitech-l] hosting wikipedia 2009/1/27 Stephen Dunn : > Hi Folks: > > I am a newbie so I apologize if I am asking basic questions. How would I go > about hosting wiktionary allowing search queries via the web using > opensearch. I am having trouble fining info on how to set this up. Any > assistance is greatly appreciated. Why do you want to host Wiktionary? It's already hosted at en.wiktionary.org. And do you mean Wiktionary (as you said in the body of your email) or Wikipedia (as you said in the subject line)? Or do you actually mean your own wiki, unrelated to either of those? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hosting wikipedia
2009/1/27 Stephen Dunn : > Hi Folks: > > I am a newbie so I apologize if I am asking basic questions. How would I go > about hosting wiktionary allowing search queries via the web using > opensearch. I am having trouble fining info on how to set this up. Any > assistance is greatly appreciated. Why do you want to host Wiktionary? It's already hosted at en.wiktionary.org. And do you mean Wiktionary (as you said in the body of your email) or Wikipedia (as you said in the subject line)? Or do you actually mean your own wiki, unrelated to either of those? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] hosting wikipedia
Hi Folks: I am a newbie so I apologize if I am asking basic questions. How would I go about hosting wiktionary allowing search queries via the web using opensearch. I am having trouble fining info on how to set this up. Any assistance is greatly appreciated. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Slow, what to look for?
Modified config file as follows: $wgUseDatabaseMessage = false; $wgUseFileCache = true; $wgMainCacheType = "CACHE_ACCEL"; I also installed xcache and eaccelerator. The improvement in speed is huge. 2009/1/27 Aryeh Gregor > > On Tue, Jan 27, 2009 at 5:31 AM, Dawson wrote: > > Hello, I have a couple of mediawiki installations on two different slices > at > > Slicehost, both of which run websites on the same slice with no speed > > problems, however, the mediawiki themselves run like dogs! > > http://wiki.medicalstudentblog.co.uk/ Any ideas what to look for or ways > to > > optimise them? I still can't get over they need a 100mb ini_set in > settings > > to just load due to the messages or something. > > If you haven't already, you should set up an opcode cache like APC or > XCache, and a variable cache like APC or XCache (if using one > application server) or memcached (if using multiple application > servers). Those are essential for decent performance. If you want > really snappy views, at least for logged-out users, you should use > Squid too, although that's probably overkill for a small site. It > also might be useful to install wikidiff2 and use that for diffs. > > Of course, none of this works if you don't have root access. (Well, > maybe you could get memcached working with only shell . . .) In that > case, I'm not sure what advice to give. > > MediaWiki is a big, slow package, though. For large sites, it has > scalability features that are almost certainly unparalleled in any > other wiki software, but it's probably not optimized as much for quick > loading on small-scale, cheap hardware. It's mainly meant for > Wikipedia. If you want to try digging into what's taking so long, you > can try enabling profiling: > > http://www.mediawiki.org/wiki/Profiling#Profiling > > If you find something that helps a lot, it would be helpful to mention it. > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Slow, what to look for?
On Tue, Jan 27, 2009 at 5:31 AM, Dawson wrote: > Hello, I have a couple of mediawiki installations on two different slices at > Slicehost, both of which run websites on the same slice with no speed > problems, however, the mediawiki themselves run like dogs! > http://wiki.medicalstudentblog.co.uk/ Any ideas what to look for or ways to > optimise them? I still can't get over they need a 100mb ini_set in settings > to just load due to the messages or something. If you haven't already, you should set up an opcode cache like APC or XCache, and a variable cache like APC or XCache (if using one application server) or memcached (if using multiple application servers). Those are essential for decent performance. If you want really snappy views, at least for logged-out users, you should use Squid too, although that's probably overkill for a small site. It also might be useful to install wikidiff2 and use that for diffs. Of course, none of this works if you don't have root access. (Well, maybe you could get memcached working with only shell . . .) In that case, I'm not sure what advice to give. MediaWiki is a big, slow package, though. For large sites, it has scalability features that are almost certainly unparalleled in any other wiki software, but it's probably not optimized as much for quick loading on small-scale, cheap hardware. It's mainly meant for Wikipedia. If you want to try digging into what's taking so long, you can try enabling profiling: http://www.mediawiki.org/wiki/Profiling#Profiling If you find something that helps a lot, it would be helpful to mention it. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] MediaWiki Slow, what to look for?
Hello, I have a couple of mediawiki installations on two different slices at Slicehost, both of which run websites on the same slice with no speed problems, however, the mediawiki themselves run like dogs! http://wiki.medicalstudentblog.co.uk/ Any ideas what to look for or ways to optimise them? I still can't get over they need a 100mb ini_set in settings to just load due to the messages or something. Thank you, Dawson ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l