Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Tei
On Wed, Jan 28, 2009 at 8:28 AM, Tei wrote: > On Wed, Jan 28, 2009 at 1:41 AM, Aryeh Gregor > wrote: >> On Tue, Jan 27, 2009 at 7:37 PM, George Herbert >> wrote: >>> Right, but a live mirror is a very different thing than a search box link. >> >> Well, as far as I can tell, we have no idea wheth

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Tei
On Wed, Jan 28, 2009 at 1:41 AM, Aryeh Gregor wrote: > On Tue, Jan 27, 2009 at 7:37 PM, George Herbert > wrote: >> Right, but a live mirror is a very different thing than a search box link. > > Well, as far as I can tell, we have no idea whether the original > poster meant either of those, or per

Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 1:13 AM, Daniel Kinzler wrote: > Marco Schuster schrieb: >>> Fetch them from the toolserver (there's a tool by duesentrieb for that). >>> It will catch almost all of them from the toolserver cluster, and make a >>> request to w

Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Jason Schulz
http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/jobs-loop/run-jobs.c?revision=22101&view=markup&sortby=date As mentioned, it is just a sample script. For sites with just one master/slave cluster, any simple script that keeps looping to run maintenance/runJobs.php will do. -Aaron

Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Platonides
Dawson wrote: > Modified config file as follows: > > $wgUseDatabaseMessage = false; > $wgUseFileCache = true; > $wgMainCacheType = "CACHE_ACCEL"; This should be $wgMainCacheType = CACHE_ACCEL; (constant) not $wgMainCacheType = "CACHE_ACCEL"; (string)

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Aryeh Gregor
On Tue, Jan 27, 2009 at 7:37 PM, George Herbert wrote: > Right, but a live mirror is a very different thing than a search box link. Well, as far as I can tell, we have no idea whether the original poster meant either of those, or perhaps something else altogether. Obviously nobody minds a search

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread George Herbert
On Tue, Jan 27, 2009 at 3:54 PM, Aryeh Gregor > wrote: > Anyway, the reason live mirrors are prohibited is not for load > reasons. I believe it's because if a site does nothing but stick up > some ads and add no value, Wikimedia is going to demand a cut of the > profit for using its trademarks a

Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-27 Thread Daniel Kinzler
Marco Schuster schrieb: >> Fetch them from the toolserver (there's a tool by duesentrieb for that). >> It will catch almost all of them from the toolserver cluster, and make a >> request to wikipedia only if needed. > I highly doubt this is "legal" use for the toolserver, and I pretty > much guess

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:53 AM, Platonides wrote: > Marco Schuster wrote: >> Hi all, >> >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. >> For this, I obvio

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:49 AM, Rolf Lampa wrote: > Marco Schuster skrev: >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. > [...] >> flaggedpages where fp_r

Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Jan 27, 2009 at 6:56 PM, Jason Schulz wrote: > Also, see > http://www.mediawiki.org/wiki/User:Aaron_Schulz/How_to_make_MediaWiki_fast The shell script you mention in step 2 has some stuff in it that makes it unusable outside Wikimedia: 1) lots

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Aryeh Gregor
On Tue, Jan 27, 2009 at 6:43 PM, George Herbert wrote: > Google switching to use our search would crush us, obviously. Doubtful. It wouldn't be terribly pleasant, but I doubt it would take down the site so easily. Alexa says google.com gets about ten times the traffic as wikipedia.org. If goog

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Platonides
Marco Schuster wrote: > Hi all, > > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. > For this, I obviously need to spider Wikipedia. > What are the limits (rate!) here, what UA should I use and what > caveats

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Daniel Kinzler
Rolf Lampa schrieb: > Marco Schuster skrev: >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. > [...] >> flaggedpages where fp_reviewed=1;". Is it correct this one gives me a >> list of all articles with flag

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Rolf Lampa
Marco Schuster skrev: > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. [...] > flaggedpages where fp_reviewed=1;". Is it correct this one gives me a > list of all articles with flagged revs, Doesn't the xml

Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Brion Vibber
On 1/27/09 2:55 PM, Robert Rohde wrote: > On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber wrote: >> On 1/27/09 2:35 PM, Thomas Dalton wrote: >>> The way I see it, what we need is to get a really powerful server >> Nope, it's a software architecture issue. We'll restart it with the new >> arch when i

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread George Herbert
On Tue, Jan 27, 2009 at 11:29 AM, Steve Summit wrote: > Jeff Ferland wrote: > > You'll need a quite impressive machine to host even just the current > > revisions of the wiki. Expect to expend 10s to even hundreds of > > gigabytes on the database alone for Wikipedia using only the current > > ver

[Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. For this, I obviously need to spider Wikipedia. What are the limits (rate!) here, what UA should I use and w

Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Robert Rohde
On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber wrote: > On 1/27/09 2:35 PM, Thomas Dalton wrote: >> The way I see it, what we need is to get a really powerful server > > Nope, it's a software architecture issue. We'll restart it with the new > arch when it's ready to go. I don't know what your tim

Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Brion Vibber
On 1/27/09 2:35 PM, Thomas Dalton wrote: > The way I see it, what we need is to get a really powerful server Nope, it's a software architecture issue. We'll restart it with the new arch when it's ready to go. -- brion ___ Wikitech-l mailing list Wikit

Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Thomas Dalton
> Whether we want to let the current process continue to try and finish > or not, I would seriously suggest someone look into redumping the rest > of the enwiki files (i.e. logs, current pages, etc.). I am also among > the people that care about having reasonably fresh dumps and it really > is a p

Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Robert Rohde
The problem, as I understand it (and Brion may come by to correct me) is essentially that the current dump process is designed in a way that can't be sustained given the size of enwiki. It really needs to be re-engineered, which means that developer time is needed to create a new approach to dumpi

Re: [Wikitech-l] Make upload headings changeable

2009-01-27 Thread Marcus Buck
Chad hett schreven: > Should be done with a wiki's content language as of r46372. > > -Chad Thanks! That's already a big improvement, but why content language? As I pointed out in response to your question, it need's to be user language on Meta, Incubator, Wikispecies, Beta Wikiversity, old Wiki

Re: [Wikitech-l] Make upload headings changeable

2009-01-27 Thread Chad
On Mon, Jan 26, 2009 at 12:44 PM, Ilmari Karonen wrote: > Chad wrote: > > I was going to provide a specific parameter for it. That entire key sucks > > though anyway, I should probably ditch the md5()'d URL in favor of using > > the actual name. Fwiw: I've got a patch working, but I'm not quite r

Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Bilal Abdul Kader
I have a decent server that is dedicated for a Wikipedia project that depends on the fresh dumps. Can this be used anyway to speed up the process of generating the dumps? bilal On Tue, Jan 27, 2009 at 2:24 PM, Christian Storm wrote: > >> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: > >> The c

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Steve Summit
Jeff Ferland wrote: > You'll need a quite impressive machine to host even just the current > revisions of the wiki. Expect to expend 10s to even hundreds of > gigabytes on the database alone for Wikipedia using only the current > versions. No, no, no. You're looking at it all wrong. That's

[Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-27 Thread Christian Storm
>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: >> The current enwiki database dump >> (http://download.wikimedia.org/enwiki/20081008/ >> ) has been crawling along since 10/15/2008. > The current dump system is not sustainable on very large wikis and > is being replaced. You'll hear about it

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Jeff Ferland
I'll try to weigh in with a bit of useful information, but it probably won't help that much. You'll need a quite impressive machine to host even just the current revisions of the wiki. Expect to expend 10s to even hundreds of gigabytes on the database alone for Wikipedia using only the curre

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Tei
maybe this is what this guy need: http://en.wiktionary.org/wiki/Special:Search";> test: http://zerror.com/unorganized/wika/test.htm it don't seems wiktionary block external searchs now (via REFERRER), but maybe may change the policy on the future/change the parameters needed. On Tue, Jan 27,

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Stephen Dunn
refer to reference. com website and do a search - Original Message From: Thomas Dalton To: Wikimedia developers Sent: Tuesday, January 27, 2009 1:07:36 PM Subject: Re: [Wikitech-l] hosting wikipedia 2009/1/27 Stephen Dunn : > yes, website. so a web page has a search box that passes t

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Thomas Dalton
2009/1/27 Stephen Dunn : > yes, website. so a web page has a search box that passes the input to > wiktionary and results are provided on a results page. an example may be > reference..com How would this differ from the search box on en.wiktionary.org? What are you actually trying to achieve? _

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Stephen Dunn
yes, website. so a web page has a search box that passes the input to wiktionary and results are provided on a results page. an example may be reference..com - Original Message From: Thomas Dalton To: Wikimedia developers Sent: Tuesday, January 27, 2009 12:50:18 PM Subject: Re: [Wik

Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Jason Schulz
To use filecache, you need to set $wgShowIPinHeader = false; Also, see http://www.mediawiki.org/wiki/User:Aaron_Schulz/How_to_make_MediaWiki_fast -Aaron -- From: "Dawson" Sent: Tuesday, January 27, 2009 9:52 AM To: "Wikimedia developers" Subject:

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Thomas Dalton
2009/1/27 Stephen Dunn : > I am working on a project to host wiktionary on one web page and wikipedia on > another. So both, sorry.. You mean web *site*, surely? They are both far too big to fit on a single page. I think you need to work out precisely what it is you're trying to do before we can

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Stephen Dunn
I am working on a project to host wiktionary on one web page and wikipedia on another. So both, sorry.. - Original Message From: Thomas Dalton To: Wikimedia developers Sent: Tuesday, January 27, 2009 12:43:49 PM Subject: Re: [Wikitech-l] hosting wikipedia 2009/1/27 Stephen Dunn : >

Re: [Wikitech-l] hosting wikipedia

2009-01-27 Thread Thomas Dalton
2009/1/27 Stephen Dunn : > Hi Folks: > > I am a newbie so I apologize if I am asking basic questions. How would I go > about hosting wiktionary allowing search queries via the web using > opensearch. I am having trouble fining info on how to set this up. Any > assistance is greatly appreciated.

[Wikitech-l] hosting wikipedia

2009-01-27 Thread Stephen Dunn
Hi Folks: I am a newbie so I apologize if I am asking basic questions. How would I go about hosting wiktionary allowing search queries via the web using opensearch. I am having trouble fining info on how to set this up. Any assistance is greatly appreciated. ___

Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Dawson
Modified config file as follows: $wgUseDatabaseMessage = false; $wgUseFileCache = true; $wgMainCacheType = "CACHE_ACCEL"; I also installed xcache and eaccelerator. The improvement in speed is huge. 2009/1/27 Aryeh Gregor > > On Tue, Jan 27, 2009 at 5:31 AM, Dawson wrote: > > Hello, I have a c

Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Aryeh Gregor
On Tue, Jan 27, 2009 at 5:31 AM, Dawson wrote: > Hello, I have a couple of mediawiki installations on two different slices at > Slicehost, both of which run websites on the same slice with no speed > problems, however, the mediawiki themselves run like dogs! > http://wiki.medicalstudentblog.co.uk/

[Wikitech-l] MediaWiki Slow, what to look for?

2009-01-27 Thread Dawson
Hello, I have a couple of mediawiki installations on two different slices at Slicehost, both of which run websites on the same slice with no speed problems, however, the mediawiki themselves run like dogs! http://wiki.medicalstudentblog.co.uk/ Any ideas what to look for or ways to optimise them? I