Re: [Wikitech-l] Cannot run the maintenance script

2013-03-26 Thread Platonides
On 25/03/13 23:19, Rahul Maliakkal wrote:
 I installed SMW extension in my local wiki yesterday and now when i visit a
 page in my local wiki i get this message A database query syntax error has
 occurred. This may indicate a bug in the software. The last attempted
 database query was:
 
 (SQL query hidden)
 
 from within function ShortUrlUtils::encodeTitle. Database returned
 error 1146:
 Table 'my_wiki.w1_shorturls' doesn't exist (127.0.0.1)
 
 Along with the page being displayed untidily.
 
 So i tried to fix the problem ,as suggested by people i tried to run php
 update.php
 Then i got the following error message
 
 A copy of your installation's LocalSettings.php
 must exist and be readable in the source directory.
 Use --conf to specify it.
 
 I have my LocalSettings.php in the same place where my default index.php is
 located,earlier i had made some logo changes to my wiki and they were
 succesfully reflected in my wiki,so the localhost has access to the
 LocalSettings.php
 
 I am working on Ubuntu and have mediawiki 1.20 installed
 
 Please Help!!Its Urgent
 
 Thanks In Advance

That's very odd. Perhaps you are running the script as a different user
which doesn't have read access? Is your file printed if from the folder
you do php update.php you run  cat ../LocalSettings.php ?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Cannot run the maintenance script

2013-03-26 Thread Dmitriy Sintsov




26 Март 2013 г. 14:16:09 пользователь Platonides (platoni...@gmail.com) написал:

On 25/03/13 23:19, Rahul Maliakkal wrote:
 I installed SMW extension in my local wiki yesterday and now when i visit a
 page in my local wiki i get this message A database query syntax error has
 occurred. This may indicate a bug in the software. The last attempted
 database query was:
 
 (SQL query hidden)
 
 from within function ShortUrlUtils::encodeTitle. Database returned

 error 1146:
 Table 'my_wiki.w1_shorturls' doesn't exist (127.0.0.1)
 
 Along with the page being displayed untidily.
 
 So i tried to fix the problem ,as suggested by people i tried to run php

 update.php
 Then i got the following error message
 
 A copy of your installation's LocalSettings.php

 must exist and be readable in the source directory.
 Use --conf to specify it.
 

It would be nice if maintenance scripts displayed the requested path to 
LocalSettins.php in case of such error.


 I have my LocalSettings.php in the same place where my default index.php is
 located,earlier i had made some logo changes to my wiki and they were
 succesfully reflected in my wiki,so the localhost has access to the
 LocalSettings.php
 
 I am working on Ubuntu and have mediawiki 1.20 installed
 
 Please Help!!Its Urgent
 
 Thanks In Advance

That's very odd. Perhaps you are running the script as a different user
which doesn't have read access? Is your file printed if from the folder
you do php update.php you run    cat ../LocalSettings.php ?


Also one may have MW_INSTALL_PATH environment variable set pointing to 
different directory. I had such weirdness at one hosting sharing two different 
versions of MediaWiki in wrong way.
Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Moving a GitHub Pull Request to Gerrit Changeset manually

2013-03-26 Thread Platonides
This should work:

WIKIMEDIA_REPOS=/path/where/you/have/your/clones
REPO=$1 # qa/browsertests
PULL=$2 # https://github.com/brainwane/qa-browsertests.git

TEMP=`mktemp --tmpdir -d pull-request.XXX`
git clone --reference=$WIKIMEDIA_REPOS/$REPO  $PULL $TEMP
cd $TEMP

if [ ! -f .gitreview ]; then
cat  .gitreview EOF
[gerrit]
host=gerrit.wikimedia.org
port=29418
project=$REPO.git
defaultbranch=master
defaultrebase=0
EOF
fi

git-review -R

rm -rf $TEMP


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Who is responsible for communicating changes in MediaWiki to WMF sites?

2013-03-26 Thread Platonides
On 25/03/13 23:35, Greg Grossmeier wrote:
 Thanks for the link, but the reason I brought it up is because my first
 week here I saw a removal of a function without an explicit @deprecated
 warning.
 
 :-)
 
 Greg

Is it possible that it was a recently-introduced function that hadn't
been published on any release yet?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bugzilla Weekly Report

2013-03-26 Thread Greg Grossmeier
quote name=Željko Filipin date=2013-03-26 time=12:28:20 +0100
 
 For those that think a picture is worth a thousand words I have attached a
 few charts generated by google docs.

These would be useful, in a general way, to have autogenerated each
week/month/quarter.

Some event-specific ones would be neat, too; I remember Ubuntu Bug Days
having a time-bounded chart for the event day showing the impact it had
on bug numbers (it surprisingly usually did, even with their massive bug
databases in LP).

Greg

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Who is responsible for communicating changes in MediaWiki to WMF sites?

2013-03-26 Thread Greg Grossmeier
quote name=Platonides date=2013-03-26 time=15:14:09 +0100
 Is it possible that it was a recently-introduced function that hadn't
 been published on any release yet?

The commit message was something like:
Removing XYZ function that hasn't been used in a long long time.

So no ;)

Greg

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Who is responsible for communicating changes in MediaWiki to WMF sites?

2013-03-26 Thread Greg Grossmeier
quote name=Federico Leva (Nemo) date=2013-03-26 time=01:20:34 +0100
 can I
 summarise that the answer to the question Who is responsible for
 communicating changes in MediaWiki to WMF sites? is the WMF
 release manager (or anyway the WMF) and that we can stop blocking
 development with WMF communication dependencies?

Sure, that makes sense, Nemo, I just need some guidance on what you
want, the goal of the communication.

It sounds like you have a specific situation that happened in mind,
could you share that with me/the list? Privately if sharing it/debating
it publicly again wouldn't be useful, thanks.

Greg

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bugzilla Weekly Report

2013-03-26 Thread Quim Gil

On 03/26/2013 09:27 AM, Greg Grossmeier wrote:

quote name=Željko Filipin date=2013-03-26 time=12:28:20 +0100


For those that think a picture is worth a thousand words I have attached a
few charts generated by google docs.


These would be useful, in a general way, to have autogenerated each
week/month/quarter.


Yes, that is the idea behind
http://www.mediawiki.org/wiki/Community_metrics (still manual)

I just need to find some time to define what metrics do we want to 
automate and then find some way / someone to implement them.


Your feedback about what metrics do you need and why is welcome in the 
talk page.




Some event-specific ones would be neat, too; I remember Ubuntu Bug Days
having a time-bounded chart for the event day showing the impact it had
on bug numbers (it surprisingly usually did, even with their massive bug
databases in LP).


We are discussing how to measure the success of bug days at
https://www.mediawiki.org/wiki/Talk:QA/Strategy and your feedback is 
also welcome!


--
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bugzilla Weekly Report

2013-03-26 Thread Željko Filipin
On Tue, Mar 26, 2013 at 5:27 PM, Greg Grossmeier g...@wikimedia.org wrote:

 These would be useful, in a general way, to have autogenerated each
 week/month/quarter.


If we can get (weekly) report in csv file, it should be trivial to
implement. Plain text format (like it is now) would be slightly more work.

Željko
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread Asher Feldman
On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan
yastrak...@wikimedia.orgwrote:

 API is fairly complex to meassure and performance target. If a bot requests
 5000 pages in one call, together with all links  categories, it might take
 a very long time (seconds if not tens of seconds). Comparing that to
 another api request that gets an HTML section of a page, which takes a
 fraction of a second (especially when comming from cache) is not very
 useful.


This is true, and I think we'd want to look at a metric like 99th
percentile latency.  There's room for corner cases taking much longer, but
they really have to be corner cases.  Standards also have to be flexible,
with different acceptable ranges for different uses.  Yet if 30% of
requests for an api method to fetch pages took tens of seconds, we'd likely
have to disable it entirely until its use or the number of pages per
request could be limited.

On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote:

  From where would you propose measuring these data points?  Obviously
  network latency will have a great impact on some of the metrics and a
  consistent location would help to define the pass/fail of each test. I do
  think another benchmark Ops features would be a set of
  latency-to-datacenter values, but I know that is a much harder taks.
 Thanks
  for putting this together.
 
 
  On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org
  wrote:
 
   I'd like to push for a codified set of minimum performance standards
 that
   new mediawiki features must meet before they can be deployed to larger
   wikimedia sites such as English Wikipedia, or be considered complete.
  
   These would look like (numbers pulled out of a hat, not actual
   suggestions):
  
   - p999 (long tail) full page request latency of 2000ms
   - p99 page request latency of 800ms
   - p90 page request latency of 150ms
   - p99 banner request latency of 80ms
   - p90 banner request latency of 40ms
   - p99 db query latency of 250ms
   - p90 db query latency of 50ms
   - 1000 write requests/sec (if applicable; writes operations must be
 free
   from concurrency issues)
   - guidelines about degrading gracefully
   - specific limits on total resource consumption across the stack per
   request
   - etc..
  
   Right now, varying amounts of effort are made to highlight potential
   performance bottlenecks in code review, and engineers are encouraged to
   profile and optimize their own code.  But beyond is the site still up
  for
   everyone / are users complaining on the village pump / am I ranting in
   irc, we've offered no guidelines as to what sort of request latency is
   reasonable or acceptable.  If a new feature (like aftv5, or flow) turns
  out
   not to meet perf standards after deployment, that would be a high
  priority
   bug and the feature may be disabled depending on the impact, or if not
   addressed in a reasonable time frame.  Obviously standards like this
  can't
   be applied to certain existing parts of mediawiki, but systems other
 than
   the parser or preprocessor that don't meet new standards should at
 least
  be
   prioritized for improvement.
  
   Thoughts?
  
   Asher
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread Yuri Astrakhan
Asher, I don't know the actual perf statistics just yet. With the API this
has to be a balance - I would want more slower calls than tons of very
fast calls - as that consumes much more bandwidth and resources (consider
getting all items one item at a time - very quick, but very inefficient).


On Tue, Mar 26, 2013 at 2:57 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan
 yastrak...@wikimedia.orgwrote:

  API is fairly complex to meassure and performance target. If a bot
 requests
  5000 pages in one call, together with all links  categories, it might
 take
  a very long time (seconds if not tens of seconds). Comparing that to
  another api request that gets an HTML section of a page, which takes a
  fraction of a second (especially when comming from cache) is not very
  useful.
 

 This is true, and I think we'd want to look at a metric like 99th
 percentile latency.  There's room for corner cases taking much longer, but
 they really have to be corner cases.  Standards also have to be flexible,
 with different acceptable ranges for different uses.  Yet if 30% of
 requests for an api method to fetch pages took tens of seconds, we'd likely
 have to disable it entirely until its use or the number of pages per
 request could be limited.

 On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote:
 
   From where would you propose measuring these data points?  Obviously
   network latency will have a great impact on some of the metrics and a
   consistent location would help to define the pass/fail of each test. I
 do
   think another benchmark Ops features would be a set of
   latency-to-datacenter values, but I know that is a much harder taks.
  Thanks
   for putting this together.
  
  
   On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org
   wrote:
  
I'd like to push for a codified set of minimum performance standards
  that
new mediawiki features must meet before they can be deployed to
 larger
wikimedia sites such as English Wikipedia, or be considered complete.
   
These would look like (numbers pulled out of a hat, not actual
suggestions):
   
- p999 (long tail) full page request latency of 2000ms
- p99 page request latency of 800ms
- p90 page request latency of 150ms
- p99 banner request latency of 80ms
- p90 banner request latency of 40ms
- p99 db query latency of 250ms
- p90 db query latency of 50ms
- 1000 write requests/sec (if applicable; writes operations must be
  free
from concurrency issues)
- guidelines about degrading gracefully
- specific limits on total resource consumption across the stack per
request
- etc..
   
Right now, varying amounts of effort are made to highlight potential
performance bottlenecks in code review, and engineers are encouraged
 to
profile and optimize their own code.  But beyond is the site still
 up
   for
everyone / are users complaining on the village pump / am I ranting
 in
irc, we've offered no guidelines as to what sort of request latency
 is
reasonable or acceptable.  If a new feature (like aftv5, or flow)
 turns
   out
not to meet perf standards after deployment, that would be a high
   priority
bug and the feature may be disabled depending on the impact, or if
 not
addressed in a reasonable time frame.  Obviously standards like this
   can't
be applied to certain existing parts of mediawiki, but systems other
  than
the parser or preprocessor that don't meet new standards should at
  least
   be
prioritized for improvement.
   
Thoughts?
   
Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread Asher Feldman
There are all good points, and we certainly do need better tooling for
individual developers.

There are a lot of things a developer can do on just a laptop in terms of
profiling code, that if done consistently, could go a long way, even
without it looking anything like production.  Things like understanding if
algorithms or queries are O(n) or O(2^n), etc. and thinking about the
potential size of the relevant production data set might  be more useful at
that stage than raw numbers.  When it comes to gathering numbers in such an
environment, it would be helpful if either the mediawiki profiler could
gain an easy visualization interface appropriate for such environments, or
if we standardized around something like xdebug.

The beta cluster has some potential as a performance test bed if only it
could gain a guarantee that the compute nodes it runs on aren't
oversubscribed or that the beta virts were otherwise consistently
resourced.  By running a set of performance benchmarks against beta and
production, we may be able to gain insight on how new features are likely
to perform.

Beyond due diligence while architecting and implementing a feature, I'm
actually a proponent of testing in production, albeit in limited ways.  Not
as with test.wikipedia.org which ran on the production cluster, but by
deploying a feature to 5% of enwiki users, or 10% of pages, or 20% of
editors.  Once something is deployed like that, we do indeed have tooling
available to gather hard performance metrics of the sort I proposed, though
they can always be improved upon.

It became apparent that ArticleFeedbackV5 had severe scaling issues after
being enabled on 10% of the articles on enwiki.  For that example, I think
it could have been caught in an architecture review or in local testing by
the developers that issuing 17 database write statements per submission of
an anonymous text box that would go at the bottom of every wikipedia
article was a bad idea.  But it's really great that it was incrementally
deployed and we could halt its progress before the resulting issues got too
serious.

That rollout methodology should be considered a great success.  If it can
become the norm, perhaps it won't be difficult to get to the point where we
can have actionable performance standards for new features, via a process
that actually encourages getting features in production instead of being a
complicated roadblock.

On Fri, Mar 22, 2013 at 1:20 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 Right now, I think many of us profile locally or in VMs, which can be
 useful for relative metrics or quickly identifying bottlenecks, but doesn't
 really get us the kind of information you're talking about from any sort of
 real-world setting, or in any way that would be consistent from engineer to
 engineer, or even necessarily from day to day. From network topology to
 article counts/sizes/etc and everything in between, there's a lot we can't
 really replicate or accurately profile against. Are there plans to put
 together and support infrastructure for this? It seems to me that this
 proposal is contingent upon a consistent environment accessible by
 engineers for performance testing.


 On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan
 yastrak...@wikimedia.orgwrote:

  API is fairly complex to meassure and performance target. If a bot
 requests
  5000 pages in one call, together with all links  categories, it might
 take
  a very long time (seconds if not tens of seconds). Comparing that to
  another api request that gets an HTML section of a page, which takes a
  fraction of a second (especially when comming from cache) is not very
  useful.
 
 
  On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote:
 
   From where would you propose measuring these data points?  Obviously
   network latency will have a great impact on some of the metrics and a
   consistent location would help to define the pass/fail of each test. I
 do
   think another benchmark Ops features would be a set of
   latency-to-datacenter values, but I know that is a much harder taks.
  Thanks
   for putting this together.
  
  
   On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org
   wrote:
  
I'd like to push for a codified set of minimum performance standards
  that
new mediawiki features must meet before they can be deployed to
 larger
wikimedia sites such as English Wikipedia, or be considered complete.
   
These would look like (numbers pulled out of a hat, not actual
suggestions):
   
- p999 (long tail) full page request latency of 2000ms
- p99 page request latency of 800ms
- p90 page request latency of 150ms
- p99 banner request latency of 80ms
- p90 banner request latency of 40ms
- p99 db query latency of 250ms
- p90 db query latency of 50ms
- 1000 write requests/sec (if applicable; writes operations must be
  free
from concurrency issues)
- guidelines about degrading gracefully

[Wikitech-l] OPW intern looking for feedback!

2013-03-26 Thread Teresa Cho
Hi everyone,

My name is Teresa (or terrrydactyl if you've seen me on IRC) and I've
been interning at Wikimedia for the last few months through the
Outreach Program for Women[1]. My project, Git2Pages[2], is an
extension to pull snippets of code/text from a git repository. I've
been working hard on learning PHP and the MediaWiki
framework/development cycle. My internship is ending soon and I wanted
to reach out to the community and ask for feedback.

Here's what the program currently does:
- User supplies (git) url, filename, branch, startline, endline using
the #snippet tag
- Git2Pages.body.php will validate the information and then pass on
the inputs into my library, GitRepository.php
- GitRepository will do a sparse checkout on the information, that is,
it will clone the repository but only keep the specified file (this
was implemented to save space)
- The repositories will be cloned into a folder that is a md5 hash of
the url + branch to make sure that the program isn't cloning a ton of
copies of the same repository
- If the repository already exists, the file will be added to the
sparse-checkout file and the program will update the working tree
- Once the repo is cloned, the program will go and yank the lines that
the user requested and it'll return the text encased in a pre tag.

This is my baseline program. It works (for me at least). I have a few
ideas of what to work on next, but I would really like to know if I'm
going in the right direction. Is this something you would use? How
does my code look, is the implementation up to the MediaWiki coding
standard?buttt You can find the progression of the code on
gerrit[3].

Here are some ideas of what I might want to implement while still on
the internship:
- Instead of a pre tag, encase it in a syntaxhighlight lang tag if
it's code, maybe add a flag for user to supply the language
- Keep a database of all the repositories that a wiki has (though not
sure how to handle deletions)

Here are some problems I might face:
- If I update the working tree each time a file from the same
repository is added, then the line numbers may not match the old file
- Should I be periodically updating the repositories or perhaps keep
multiple snapshots of the same repository
- Cloning an entire repository and keeping only one file does not seem
ideal, but I've yet to find a better solution, the more repositories
being used concurrently the bigger an issue this might be
- I'm also worried about security implications of my program. Security
isn't my area of expertise, and I would definitely appreciate some
input from people with a security background

Thanks for taking the time to read this and thanks in advance for any
feedback, bug reports, etc.

Have a great day,
Teresa
http://www.mediawiki.org/wiki/User:Chot

[1] https://www.mediawiki.org/wiki/Outreach_Program_for_Women
[2] http://www.mediawiki.org/wiki/Extension:Git2Pages
[3] 
https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Git2Pages,n,z

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] OPW intern looking for feedback!

2013-03-26 Thread Martijn Hoekstra
On Mar 27, 2013 1:31 AM, Teresa Cho tcho...@gmail.com wrote:

 Hi everyone,

 My name is Teresa (or terrrydactyl if you've seen me on IRC) and I've
 been interning at Wikimedia for the last few months through the
 Outreach Program for Women[1]. My project, Git2Pages[2], is an
 extension to pull snippets of code/text from a git repository. I've
 been working hard on learning PHP and the MediaWiki
 framework/development cycle. My internship is ending soon and I wanted
 to reach out to the community and ask for feedback.


Cool stuff!

 Here's what the program currently does:
 - User supplies (git) url, filename, branch, startline, endline using
 the #snippet tag
 - Git2Pages.body.php will validate the information and then pass on
 the inputs into my library, GitRepository.php
 - GitRepository will do a sparse checkout on the information, that is,
 it will clone the repository but only keep the specified file (this
 was implemented to save space)
 - The repositories will be cloned into a folder that is a md5 hash of
 the url + branch to make sure that the program isn't cloning a ton of
 copies of the same repository

Why hash it, and not just keep the url + branch encoded to some charset
that is a valid path, saving rare yet hairy collisions?

 - If the repository already exists, the file will be added to the
 sparse-checkout file and the program will update the working tree

Will there be a re checkout for a duplicate request? Will the cache of
files ever be cleaned?

 - Once the repo is cloned, the program will go and yank the lines that
 the user requested and it'll return the text encased in a pre tag.

 This is my baseline program. It works (for me at least). I have a few
 ideas of what to work on next, but I would really like to know if I'm
 going in the right direction. Is this something you would use? How
 does my code look, is the implementation up to the MediaWiki coding
 standard?buttt You can find the progression of the code on
 gerrit[3].

 Here are some ideas of what I might want to implement while still on
 the internship:
 - Instead of a pre tag, encase it in a syntaxhighlight lang tag if
 it's code, maybe add a flag for user to supply the language
 - Keep a database of all the repositories that a wiki has (though not
 sure how to handle deletions)

 Here are some problems I might face:
 - If I update the working tree each time a file from the same
 repository is added, then the line numbers may not match the old file
 - Should I be periodically updating the repositories or perhaps keep
 multiple snapshots of the same repository
 - Cloning an entire repository and keeping only one file does not seem
 ideal, but I've yet to find a better solution, the more repositories
 being used concurrently the bigger an issue this might be
 - I'm also worried about security implications of my program. Security
 isn't my area of expertise, and I would definitely appreciate some
 input from people with a security background

 Thanks for taking the time to read this and thanks in advance for any
 feedback, bug reports, etc.

 Have a great day,
 Teresa
 http://www.mediawiki.org/wiki/User:Chot

 [1] https://www.mediawiki.org/wiki/Outreach_Program_for_Women
 [2] http://www.mediawiki.org/wiki/Extension:Git2Pages
 [3]
https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Git2Pages,n,z

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] OPW intern looking for feedback!

2013-03-26 Thread Teresa Cho
 Why hash it, and not just keep the url + branch encoded to some charset
 that is a valid path, saving rare yet hairy collisions?

Hmm, good point. Originally, I had my program cloned via a git clone
that would just clone into a folder of whatever the filename was.
Therefore if we had two repositories named MyRepo, it would cause
problems. We (my mentor Sebastien aka Dereckson) had done the md5 hash
as a way to combat that but I supposed we didn't think of creating a
folder using just the name of the url. I think this was back when we
might use local repositories. My program can use urls like this:
/some/path/to/repo. In which case, you may get different local urls?
Thinking about it, it doesn't really make sense. I think this is just
one of those scenarios where we thought of a solution and worked and
the program evolved in a way where it wasn't necessary anymore.
Changing the hash back to charset of a path should be an easy fix.

Speaking of local repositories, another idea was to disable the option
to have a local repository as the git url as a safety precaution or
something. Should I do away with the local repo feature entirely or
should I set a flag and have it be off by default, but allow the user
to choose to turn it on?

 Will there be a re checkout for a duplicate request? Will the cache of
 files ever be cleaned?

I did a little test and it looks like if you duplicate the request,
the original content doesn't change. So even if it has be remotely
changed, the local copy doesn't change. Additionally, when you add a
file to the sparse-checkout, the ` git read-tree -mu HEAD ` command
only seems to pull the added file but doesn't seem the change the
contents of a file that's already in the repo. So right now the only
way to get new content is to delete the repository and rerun it. This
is a good thing to work on. I'm not sure what you mean by the cache of
the files being clean. Can you elaborate on the scenario and I can
give you a better answer.

Thanks for your feedback! :)

-- Teresa

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread Ryan Lane
On Tue, Mar 26, 2013 at 3:58 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 There are all good points, and we certainly do need better tooling for
 individual developers.

 There are a lot of things a developer can do on just a laptop in terms of
 profiling code, that if done consistently, could go a long way, even
 without it looking anything like production.  Things like understanding if
 algorithms or queries are O(n) or O(2^n), etc. and thinking about the
 potential size of the relevant production data set might  be more useful at
 that stage than raw numbers.  When it comes to gathering numbers in such an
 environment, it would be helpful if either the mediawiki profiler could
 gain an easy visualization interface appropriate for such environments, or
 if we standardized around something like xdebug.

 The beta cluster has some potential as a performance test bed if only it
 could gain a guarantee that the compute nodes it runs on aren't
 oversubscribed or that the beta virts were otherwise consistently
 resourced.  By running a set of performance benchmarks against beta and
 production, we may be able to gain insight on how new features are likely
 to perform.


This is possible in newer versions of OpenStack, using scheduler hinting.

That said, the instances would still be sharing a host with each other,
which can cause some inconsistencies. We'd likely not want to use beta
itself, but something that has limited access for performance testing
purposes only, as we wouldn't want other unrelated testing load to skew
results. Additionally, we'd want to make sure to avoid things like
/data/project or /home (both of which beta is using), even once we've moved
to a more stable shared storage solution, as it could very heavily skew
results based on load from other projects.

We need to upgrade to the Folsom release or greater for a few other
features anyway, and enabling scheduler hinting is pretty simple. I'd say
let's consider adding something like this to the Labs infrastructure
roadmap once the upgrade happens and we've tested out the hinting feature.

- Ryan
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread George Herbert
On Tue, Mar 26, 2013 at 8:15 PM, Ryan Lane rlan...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 3:58 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 There are all good points, and we certainly do need better tooling for
 individual developers.

 There are a lot of things a developer can do on just a laptop in terms of
 profiling code, that if done consistently, could go a long way, even
 without it looking anything like production.  Things like understanding if
 algorithms or queries are O(n) or O(2^n), etc. and thinking about the
 potential size of the relevant production data set might  be more useful at
 that stage than raw numbers.  When it comes to gathering numbers in such an
 environment, it would be helpful if either the mediawiki profiler could
 gain an easy visualization interface appropriate for such environments, or
 if we standardized around something like xdebug.

 The beta cluster has some potential as a performance test bed if only it
 could gain a guarantee that the compute nodes it runs on aren't
 oversubscribed or that the beta virts were otherwise consistently
 resourced.  By running a set of performance benchmarks against beta and
 production, we may be able to gain insight on how new features are likely
 to perform.


 This is possible in newer versions of OpenStack, using scheduler hinting.

 That said, the instances would still be sharing a host with each other,
 which can cause some inconsistencies. We'd likely not want to use beta
 itself, but something that has limited access for performance testing
 purposes only, as we wouldn't want other unrelated testing load to skew
 results. Additionally, we'd want to make sure to avoid things like
 /data/project or /home (both of which beta is using), even once we've moved
 to a more stable shared storage solution, as it could very heavily skew
 results based on load from other projects.

 We need to upgrade to the Folsom release or greater for a few other
 features anyway, and enabling scheduler hinting is pretty simple. I'd say
 let's consider adding something like this to the Labs infrastructure
 roadmap once the upgrade happens and we've tested out the hinting feature.

I am concerned in this discussion with insufficient testbed load
generation and avoidance of confounding variables in the performance
analysis...



-- 
-george william herbert
george.herb...@gmail.com

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Pronunciation recording tool wanted

2013-03-26 Thread Tyler Romeo
I'm not sure whether it'd be helpful for this project, but
https://github.com/akrennmair/speech-to-server looks interesting. Somebody
ported lame (the mp3 encoder) to JavaScript. The demo I linked to records
in the browser and streams it to a server over websocket.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com


On Sun, Mar 17, 2013 at 9:47 AM, Sumana Harihareswara suma...@wikimedia.org
 wrote:

 On 03/13/2013 12:15 AM, Antoine Musso wrote:
  Le 13/03/13 04:07, K. Peachey wrote:
  That wouldn't be a bad project for GSoC as it isn't too large so it
 means
  we could actually see some results, And if it was too small, The student
  could probably do a couple of smaller projects (it being one) then
 focus on
  one after the other.
 
  The smaller big project: get its code deployed on the cluster and
  enabled for all wikis!

 Quick reminder:

 If you think something would be a good project for a student, put it on
 https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects .
 I suggest we scope these proposals at about 6 weeks of coding work to
 ensure we dedicate enough time (out of the 3-month GSoC period) to
 bugfixing and code review.  Past proposals often allotted either no time
 or about 2 weeks for merging with trunk, pre-deploy code review, and
 integration.  That's not enough.

 Basically, if you think a project might take about 2 weeks for you to
 code, go ahead and put it on that list.  Students run into lots of
 problems, and your 2-week project is someone else's whole summer.

 --
 Sumana Harihareswara
 Engineering Community Manager
 Wikimedia Foundation

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l