Re: [Wikitech-l] Cannot run the maintenance script
On 25/03/13 23:19, Rahul Maliakkal wrote: I installed SMW extension in my local wiki yesterday and now when i visit a page in my local wiki i get this message A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function ShortUrlUtils::encodeTitle. Database returned error 1146: Table 'my_wiki.w1_shorturls' doesn't exist (127.0.0.1) Along with the page being displayed untidily. So i tried to fix the problem ,as suggested by people i tried to run php update.php Then i got the following error message A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. I have my LocalSettings.php in the same place where my default index.php is located,earlier i had made some logo changes to my wiki and they were succesfully reflected in my wiki,so the localhost has access to the LocalSettings.php I am working on Ubuntu and have mediawiki 1.20 installed Please Help!!Its Urgent Thanks In Advance That's very odd. Perhaps you are running the script as a different user which doesn't have read access? Is your file printed if from the folder you do php update.php you run cat ../LocalSettings.php ? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Cannot run the maintenance script
26 Март 2013 г. 14:16:09 пользователь Platonides (platoni...@gmail.com) написал: On 25/03/13 23:19, Rahul Maliakkal wrote: I installed SMW extension in my local wiki yesterday and now when i visit a page in my local wiki i get this message A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function ShortUrlUtils::encodeTitle. Database returned error 1146: Table 'my_wiki.w1_shorturls' doesn't exist (127.0.0.1) Along with the page being displayed untidily. So i tried to fix the problem ,as suggested by people i tried to run php update.php Then i got the following error message A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. It would be nice if maintenance scripts displayed the requested path to LocalSettins.php in case of such error. I have my LocalSettings.php in the same place where my default index.php is located,earlier i had made some logo changes to my wiki and they were succesfully reflected in my wiki,so the localhost has access to the LocalSettings.php I am working on Ubuntu and have mediawiki 1.20 installed Please Help!!Its Urgent Thanks In Advance That's very odd. Perhaps you are running the script as a different user which doesn't have read access? Is your file printed if from the folder you do php update.php you run cat ../LocalSettings.php ? Also one may have MW_INSTALL_PATH environment variable set pointing to different directory. I had such weirdness at one hosting sharing two different versions of MediaWiki in wrong way. Dmitriy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Moving a GitHub Pull Request to Gerrit Changeset manually
This should work: WIKIMEDIA_REPOS=/path/where/you/have/your/clones REPO=$1 # qa/browsertests PULL=$2 # https://github.com/brainwane/qa-browsertests.git TEMP=`mktemp --tmpdir -d pull-request.XXX` git clone --reference=$WIKIMEDIA_REPOS/$REPO $PULL $TEMP cd $TEMP if [ ! -f .gitreview ]; then cat .gitreview EOF [gerrit] host=gerrit.wikimedia.org port=29418 project=$REPO.git defaultbranch=master defaultrebase=0 EOF fi git-review -R rm -rf $TEMP ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Who is responsible for communicating changes in MediaWiki to WMF sites?
On 25/03/13 23:35, Greg Grossmeier wrote: Thanks for the link, but the reason I brought it up is because my first week here I saw a removal of a function without an explicit @deprecated warning. :-) Greg Is it possible that it was a recently-introduced function that hadn't been published on any release yet? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Weekly Report
quote name=Željko Filipin date=2013-03-26 time=12:28:20 +0100 For those that think a picture is worth a thousand words I have attached a few charts generated by google docs. These would be useful, in a general way, to have autogenerated each week/month/quarter. Some event-specific ones would be neat, too; I remember Ubuntu Bug Days having a time-bounded chart for the event day showing the impact it had on bug numbers (it surprisingly usually did, even with their massive bug databases in LP). Greg -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Who is responsible for communicating changes in MediaWiki to WMF sites?
quote name=Platonides date=2013-03-26 time=15:14:09 +0100 Is it possible that it was a recently-introduced function that hadn't been published on any release yet? The commit message was something like: Removing XYZ function that hasn't been used in a long long time. So no ;) Greg -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Who is responsible for communicating changes in MediaWiki to WMF sites?
quote name=Federico Leva (Nemo) date=2013-03-26 time=01:20:34 +0100 can I summarise that the answer to the question Who is responsible for communicating changes in MediaWiki to WMF sites? is the WMF release manager (or anyway the WMF) and that we can stop blocking development with WMF communication dependencies? Sure, that makes sense, Nemo, I just need some guidance on what you want, the goal of the communication. It sounds like you have a specific situation that happened in mind, could you share that with me/the list? Privately if sharing it/debating it publicly again wouldn't be useful, thanks. Greg -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Weekly Report
On 03/26/2013 09:27 AM, Greg Grossmeier wrote: quote name=Željko Filipin date=2013-03-26 time=12:28:20 +0100 For those that think a picture is worth a thousand words I have attached a few charts generated by google docs. These would be useful, in a general way, to have autogenerated each week/month/quarter. Yes, that is the idea behind http://www.mediawiki.org/wiki/Community_metrics (still manual) I just need to find some time to define what metrics do we want to automate and then find some way / someone to implement them. Your feedback about what metrics do you need and why is welcome in the talk page. Some event-specific ones would be neat, too; I remember Ubuntu Bug Days having a time-bounded chart for the event day showing the impact it had on bug numbers (it surprisingly usually did, even with their massive bug databases in LP). We are discussing how to measure the success of bug days at https://www.mediawiki.org/wiki/Talk:QA/Strategy and your feedback is also welcome! -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bugzilla Weekly Report
On Tue, Mar 26, 2013 at 5:27 PM, Greg Grossmeier g...@wikimedia.org wrote: These would be useful, in a general way, to have autogenerated each week/month/quarter. If we can get (weekly) report in csv file, it should be trivial to implement. Plain text format (like it is now) would be slightly more work. Željko ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan yastrak...@wikimedia.orgwrote: API is fairly complex to meassure and performance target. If a bot requests 5000 pages in one call, together with all links categories, it might take a very long time (seconds if not tens of seconds). Comparing that to another api request that gets an HTML section of a page, which takes a fraction of a second (especially when comming from cache) is not very useful. This is true, and I think we'd want to look at a metric like 99th percentile latency. There's room for corner cases taking much longer, but they really have to be corner cases. Standards also have to be flexible, with different acceptable ranges for different uses. Yet if 30% of requests for an api method to fetch pages took tens of seconds, we'd likely have to disable it entirely until its use or the number of pages per request could be limited. On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote: From where would you propose measuring these data points? Obviously network latency will have a great impact on some of the metrics and a consistent location would help to define the pass/fail of each test. I do think another benchmark Ops features would be a set of latency-to-datacenter values, but I know that is a much harder taks. Thanks for putting this together. On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org wrote: I'd like to push for a codified set of minimum performance standards that new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete. These would look like (numbers pulled out of a hat, not actual suggestions): - p999 (long tail) full page request latency of 2000ms - p99 page request latency of 800ms - p90 page request latency of 150ms - p99 banner request latency of 80ms - p90 banner request latency of 40ms - p99 db query latency of 250ms - p90 db query latency of 50ms - 1000 write requests/sec (if applicable; writes operations must be free from concurrency issues) - guidelines about degrading gracefully - specific limits on total resource consumption across the stack per request - etc.. Right now, varying amounts of effort are made to highlight potential performance bottlenecks in code review, and engineers are encouraged to profile and optimize their own code. But beyond is the site still up for everyone / are users complaining on the village pump / am I ranting in irc, we've offered no guidelines as to what sort of request latency is reasonable or acceptable. If a new feature (like aftv5, or flow) turns out not to meet perf standards after deployment, that would be a high priority bug and the feature may be disabled depending on the impact, or if not addressed in a reasonable time frame. Obviously standards like this can't be applied to certain existing parts of mediawiki, but systems other than the parser or preprocessor that don't meet new standards should at least be prioritized for improvement. Thoughts? Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
Asher, I don't know the actual perf statistics just yet. With the API this has to be a balance - I would want more slower calls than tons of very fast calls - as that consumes much more bandwidth and resources (consider getting all items one item at a time - very quick, but very inefficient). On Tue, Mar 26, 2013 at 2:57 PM, Asher Feldman afeld...@wikimedia.orgwrote: On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan yastrak...@wikimedia.orgwrote: API is fairly complex to meassure and performance target. If a bot requests 5000 pages in one call, together with all links categories, it might take a very long time (seconds if not tens of seconds). Comparing that to another api request that gets an HTML section of a page, which takes a fraction of a second (especially when comming from cache) is not very useful. This is true, and I think we'd want to look at a metric like 99th percentile latency. There's room for corner cases taking much longer, but they really have to be corner cases. Standards also have to be flexible, with different acceptable ranges for different uses. Yet if 30% of requests for an api method to fetch pages took tens of seconds, we'd likely have to disable it entirely until its use or the number of pages per request could be limited. On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote: From where would you propose measuring these data points? Obviously network latency will have a great impact on some of the metrics and a consistent location would help to define the pass/fail of each test. I do think another benchmark Ops features would be a set of latency-to-datacenter values, but I know that is a much harder taks. Thanks for putting this together. On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org wrote: I'd like to push for a codified set of minimum performance standards that new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete. These would look like (numbers pulled out of a hat, not actual suggestions): - p999 (long tail) full page request latency of 2000ms - p99 page request latency of 800ms - p90 page request latency of 150ms - p99 banner request latency of 80ms - p90 banner request latency of 40ms - p99 db query latency of 250ms - p90 db query latency of 50ms - 1000 write requests/sec (if applicable; writes operations must be free from concurrency issues) - guidelines about degrading gracefully - specific limits on total resource consumption across the stack per request - etc.. Right now, varying amounts of effort are made to highlight potential performance bottlenecks in code review, and engineers are encouraged to profile and optimize their own code. But beyond is the site still up for everyone / are users complaining on the village pump / am I ranting in irc, we've offered no guidelines as to what sort of request latency is reasonable or acceptable. If a new feature (like aftv5, or flow) turns out not to meet perf standards after deployment, that would be a high priority bug and the feature may be disabled depending on the impact, or if not addressed in a reasonable time frame. Obviously standards like this can't be applied to certain existing parts of mediawiki, but systems other than the parser or preprocessor that don't meet new standards should at least be prioritized for improvement. Thoughts? Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
There are all good points, and we certainly do need better tooling for individual developers. There are a lot of things a developer can do on just a laptop in terms of profiling code, that if done consistently, could go a long way, even without it looking anything like production. Things like understanding if algorithms or queries are O(n) or O(2^n), etc. and thinking about the potential size of the relevant production data set might be more useful at that stage than raw numbers. When it comes to gathering numbers in such an environment, it would be helpful if either the mediawiki profiler could gain an easy visualization interface appropriate for such environments, or if we standardized around something like xdebug. The beta cluster has some potential as a performance test bed if only it could gain a guarantee that the compute nodes it runs on aren't oversubscribed or that the beta virts were otherwise consistently resourced. By running a set of performance benchmarks against beta and production, we may be able to gain insight on how new features are likely to perform. Beyond due diligence while architecting and implementing a feature, I'm actually a proponent of testing in production, albeit in limited ways. Not as with test.wikipedia.org which ran on the production cluster, but by deploying a feature to 5% of enwiki users, or 10% of pages, or 20% of editors. Once something is deployed like that, we do indeed have tooling available to gather hard performance metrics of the sort I proposed, though they can always be improved upon. It became apparent that ArticleFeedbackV5 had severe scaling issues after being enabled on 10% of the articles on enwiki. For that example, I think it could have been caught in an architecture review or in local testing by the developers that issuing 17 database write statements per submission of an anonymous text box that would go at the bottom of every wikipedia article was a bad idea. But it's really great that it was incrementally deployed and we could halt its progress before the resulting issues got too serious. That rollout methodology should be considered a great success. If it can become the norm, perhaps it won't be difficult to get to the point where we can have actionable performance standards for new features, via a process that actually encourages getting features in production instead of being a complicated roadblock. On Fri, Mar 22, 2013 at 1:20 PM, Arthur Richards aricha...@wikimedia.orgwrote: Right now, I think many of us profile locally or in VMs, which can be useful for relative metrics or quickly identifying bottlenecks, but doesn't really get us the kind of information you're talking about from any sort of real-world setting, or in any way that would be consistent from engineer to engineer, or even necessarily from day to day. From network topology to article counts/sizes/etc and everything in between, there's a lot we can't really replicate or accurately profile against. Are there plans to put together and support infrastructure for this? It seems to me that this proposal is contingent upon a consistent environment accessible by engineers for performance testing. On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan yastrak...@wikimedia.orgwrote: API is fairly complex to meassure and performance target. If a bot requests 5000 pages in one call, together with all links categories, it might take a very long time (seconds if not tens of seconds). Comparing that to another api request that gets an HTML section of a page, which takes a fraction of a second (especially when comming from cache) is not very useful. On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote: From where would you propose measuring these data points? Obviously network latency will have a great impact on some of the metrics and a consistent location would help to define the pass/fail of each test. I do think another benchmark Ops features would be a set of latency-to-datacenter values, but I know that is a much harder taks. Thanks for putting this together. On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org wrote: I'd like to push for a codified set of minimum performance standards that new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete. These would look like (numbers pulled out of a hat, not actual suggestions): - p999 (long tail) full page request latency of 2000ms - p99 page request latency of 800ms - p90 page request latency of 150ms - p99 banner request latency of 80ms - p90 banner request latency of 40ms - p99 db query latency of 250ms - p90 db query latency of 50ms - 1000 write requests/sec (if applicable; writes operations must be free from concurrency issues) - guidelines about degrading gracefully
[Wikitech-l] OPW intern looking for feedback!
Hi everyone, My name is Teresa (or terrrydactyl if you've seen me on IRC) and I've been interning at Wikimedia for the last few months through the Outreach Program for Women[1]. My project, Git2Pages[2], is an extension to pull snippets of code/text from a git repository. I've been working hard on learning PHP and the MediaWiki framework/development cycle. My internship is ending soon and I wanted to reach out to the community and ask for feedback. Here's what the program currently does: - User supplies (git) url, filename, branch, startline, endline using the #snippet tag - Git2Pages.body.php will validate the information and then pass on the inputs into my library, GitRepository.php - GitRepository will do a sparse checkout on the information, that is, it will clone the repository but only keep the specified file (this was implemented to save space) - The repositories will be cloned into a folder that is a md5 hash of the url + branch to make sure that the program isn't cloning a ton of copies of the same repository - If the repository already exists, the file will be added to the sparse-checkout file and the program will update the working tree - Once the repo is cloned, the program will go and yank the lines that the user requested and it'll return the text encased in a pre tag. This is my baseline program. It works (for me at least). I have a few ideas of what to work on next, but I would really like to know if I'm going in the right direction. Is this something you would use? How does my code look, is the implementation up to the MediaWiki coding standard?buttt You can find the progression of the code on gerrit[3]. Here are some ideas of what I might want to implement while still on the internship: - Instead of a pre tag, encase it in a syntaxhighlight lang tag if it's code, maybe add a flag for user to supply the language - Keep a database of all the repositories that a wiki has (though not sure how to handle deletions) Here are some problems I might face: - If I update the working tree each time a file from the same repository is added, then the line numbers may not match the old file - Should I be periodically updating the repositories or perhaps keep multiple snapshots of the same repository - Cloning an entire repository and keeping only one file does not seem ideal, but I've yet to find a better solution, the more repositories being used concurrently the bigger an issue this might be - I'm also worried about security implications of my program. Security isn't my area of expertise, and I would definitely appreciate some input from people with a security background Thanks for taking the time to read this and thanks in advance for any feedback, bug reports, etc. Have a great day, Teresa http://www.mediawiki.org/wiki/User:Chot [1] https://www.mediawiki.org/wiki/Outreach_Program_for_Women [2] http://www.mediawiki.org/wiki/Extension:Git2Pages [3] https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Git2Pages,n,z ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] OPW intern looking for feedback!
On Mar 27, 2013 1:31 AM, Teresa Cho tcho...@gmail.com wrote: Hi everyone, My name is Teresa (or terrrydactyl if you've seen me on IRC) and I've been interning at Wikimedia for the last few months through the Outreach Program for Women[1]. My project, Git2Pages[2], is an extension to pull snippets of code/text from a git repository. I've been working hard on learning PHP and the MediaWiki framework/development cycle. My internship is ending soon and I wanted to reach out to the community and ask for feedback. Cool stuff! Here's what the program currently does: - User supplies (git) url, filename, branch, startline, endline using the #snippet tag - Git2Pages.body.php will validate the information and then pass on the inputs into my library, GitRepository.php - GitRepository will do a sparse checkout on the information, that is, it will clone the repository but only keep the specified file (this was implemented to save space) - The repositories will be cloned into a folder that is a md5 hash of the url + branch to make sure that the program isn't cloning a ton of copies of the same repository Why hash it, and not just keep the url + branch encoded to some charset that is a valid path, saving rare yet hairy collisions? - If the repository already exists, the file will be added to the sparse-checkout file and the program will update the working tree Will there be a re checkout for a duplicate request? Will the cache of files ever be cleaned? - Once the repo is cloned, the program will go and yank the lines that the user requested and it'll return the text encased in a pre tag. This is my baseline program. It works (for me at least). I have a few ideas of what to work on next, but I would really like to know if I'm going in the right direction. Is this something you would use? How does my code look, is the implementation up to the MediaWiki coding standard?buttt You can find the progression of the code on gerrit[3]. Here are some ideas of what I might want to implement while still on the internship: - Instead of a pre tag, encase it in a syntaxhighlight lang tag if it's code, maybe add a flag for user to supply the language - Keep a database of all the repositories that a wiki has (though not sure how to handle deletions) Here are some problems I might face: - If I update the working tree each time a file from the same repository is added, then the line numbers may not match the old file - Should I be periodically updating the repositories or perhaps keep multiple snapshots of the same repository - Cloning an entire repository and keeping only one file does not seem ideal, but I've yet to find a better solution, the more repositories being used concurrently the bigger an issue this might be - I'm also worried about security implications of my program. Security isn't my area of expertise, and I would definitely appreciate some input from people with a security background Thanks for taking the time to read this and thanks in advance for any feedback, bug reports, etc. Have a great day, Teresa http://www.mediawiki.org/wiki/User:Chot [1] https://www.mediawiki.org/wiki/Outreach_Program_for_Women [2] http://www.mediawiki.org/wiki/Extension:Git2Pages [3] https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Git2Pages,n,z ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] OPW intern looking for feedback!
Why hash it, and not just keep the url + branch encoded to some charset that is a valid path, saving rare yet hairy collisions? Hmm, good point. Originally, I had my program cloned via a git clone that would just clone into a folder of whatever the filename was. Therefore if we had two repositories named MyRepo, it would cause problems. We (my mentor Sebastien aka Dereckson) had done the md5 hash as a way to combat that but I supposed we didn't think of creating a folder using just the name of the url. I think this was back when we might use local repositories. My program can use urls like this: /some/path/to/repo. In which case, you may get different local urls? Thinking about it, it doesn't really make sense. I think this is just one of those scenarios where we thought of a solution and worked and the program evolved in a way where it wasn't necessary anymore. Changing the hash back to charset of a path should be an easy fix. Speaking of local repositories, another idea was to disable the option to have a local repository as the git url as a safety precaution or something. Should I do away with the local repo feature entirely or should I set a flag and have it be off by default, but allow the user to choose to turn it on? Will there be a re checkout for a duplicate request? Will the cache of files ever be cleaned? I did a little test and it looks like if you duplicate the request, the original content doesn't change. So even if it has be remotely changed, the local copy doesn't change. Additionally, when you add a file to the sparse-checkout, the ` git read-tree -mu HEAD ` command only seems to pull the added file but doesn't seem the change the contents of a file that's already in the repo. So right now the only way to get new content is to delete the repository and rerun it. This is a good thing to work on. I'm not sure what you mean by the cache of the files being clean. Can you elaborate on the scenario and I can give you a better answer. Thanks for your feedback! :) -- Teresa ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
On Tue, Mar 26, 2013 at 3:58 PM, Asher Feldman afeld...@wikimedia.orgwrote: There are all good points, and we certainly do need better tooling for individual developers. There are a lot of things a developer can do on just a laptop in terms of profiling code, that if done consistently, could go a long way, even without it looking anything like production. Things like understanding if algorithms or queries are O(n) or O(2^n), etc. and thinking about the potential size of the relevant production data set might be more useful at that stage than raw numbers. When it comes to gathering numbers in such an environment, it would be helpful if either the mediawiki profiler could gain an easy visualization interface appropriate for such environments, or if we standardized around something like xdebug. The beta cluster has some potential as a performance test bed if only it could gain a guarantee that the compute nodes it runs on aren't oversubscribed or that the beta virts were otherwise consistently resourced. By running a set of performance benchmarks against beta and production, we may be able to gain insight on how new features are likely to perform. This is possible in newer versions of OpenStack, using scheduler hinting. That said, the instances would still be sharing a host with each other, which can cause some inconsistencies. We'd likely not want to use beta itself, but something that has limited access for performance testing purposes only, as we wouldn't want other unrelated testing load to skew results. Additionally, we'd want to make sure to avoid things like /data/project or /home (both of which beta is using), even once we've moved to a more stable shared storage solution, as it could very heavily skew results based on load from other projects. We need to upgrade to the Folsom release or greater for a few other features anyway, and enabling scheduler hinting is pretty simple. I'd say let's consider adding something like this to the Labs infrastructure roadmap once the upgrade happens and we've tested out the hinting feature. - Ryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
On Tue, Mar 26, 2013 at 8:15 PM, Ryan Lane rlan...@gmail.com wrote: On Tue, Mar 26, 2013 at 3:58 PM, Asher Feldman afeld...@wikimedia.orgwrote: There are all good points, and we certainly do need better tooling for individual developers. There are a lot of things a developer can do on just a laptop in terms of profiling code, that if done consistently, could go a long way, even without it looking anything like production. Things like understanding if algorithms or queries are O(n) or O(2^n), etc. and thinking about the potential size of the relevant production data set might be more useful at that stage than raw numbers. When it comes to gathering numbers in such an environment, it would be helpful if either the mediawiki profiler could gain an easy visualization interface appropriate for such environments, or if we standardized around something like xdebug. The beta cluster has some potential as a performance test bed if only it could gain a guarantee that the compute nodes it runs on aren't oversubscribed or that the beta virts were otherwise consistently resourced. By running a set of performance benchmarks against beta and production, we may be able to gain insight on how new features are likely to perform. This is possible in newer versions of OpenStack, using scheduler hinting. That said, the instances would still be sharing a host with each other, which can cause some inconsistencies. We'd likely not want to use beta itself, but something that has limited access for performance testing purposes only, as we wouldn't want other unrelated testing load to skew results. Additionally, we'd want to make sure to avoid things like /data/project or /home (both of which beta is using), even once we've moved to a more stable shared storage solution, as it could very heavily skew results based on load from other projects. We need to upgrade to the Folsom release or greater for a few other features anyway, and enabling scheduler hinting is pretty simple. I'd say let's consider adding something like this to the Labs infrastructure roadmap once the upgrade happens and we've tested out the hinting feature. I am concerned in this discussion with insufficient testbed load generation and avoidance of confounding variables in the performance analysis... -- -george william herbert george.herb...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Pronunciation recording tool wanted
I'm not sure whether it'd be helpful for this project, but https://github.com/akrennmair/speech-to-server looks interesting. Somebody ported lame (the mp3 encoder) to JavaScript. The demo I linked to records in the browser and streams it to a server over websocket. *-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com On Sun, Mar 17, 2013 at 9:47 AM, Sumana Harihareswara suma...@wikimedia.org wrote: On 03/13/2013 12:15 AM, Antoine Musso wrote: Le 13/03/13 04:07, K. Peachey wrote: That wouldn't be a bad project for GSoC as it isn't too large so it means we could actually see some results, And if it was too small, The student could probably do a couple of smaller projects (it being one) then focus on one after the other. The smaller big project: get its code deployed on the cluster and enabled for all wikis! Quick reminder: If you think something would be a good project for a student, put it on https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects . I suggest we scope these proposals at about 6 weeks of coding work to ensure we dedicate enough time (out of the 3-month GSoC period) to bugfixing and code review. Past proposals often allotted either no time or about 2 weeks for merging with trunk, pre-deploy code review, and integration. That's not enough. Basically, if you think a project might take about 2 weeks for you to code, go ahead and put it on that list. Students run into lots of problems, and your 2-week project is someone else's whole summer. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l