Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet).  I fixed it just now, please retry your reports
and let me know if anything is still wrong.

On Fri, Jan 23, 2015 at 10:46 AM, Dan Andreescu <[email protected]>
wrote:

> Hi everyone.  I will work on this as soon as I get into the office, in
> about an hour from now.  Yuvi suggested one thing that I wasn't aware of
> that might make this a simple fix.
>
>
> On Friday, January 23, 2015, Dan Higgins <[email protected]> wrote:
>
>> Hi Kevin,
>>
>> Sorry to be a pest but do you have any update on sorting out the
>> Wikimetrics issues? It seems to have gotten worse since we last spoke to
>> you with around 1 in 10 reports going through.
>>
>> Thanks,
>>
>> Dan
>>
>> On Tue, Jan 20, 2015 at 7:17 PM, Kevin Leduc <[email protected]> wrote:
>>
>>> All the developers are in transit to SF today.  Dan said he'd be in the
>>> office this afternoon.  First dev I see I'll notify them of problems in
>>> wikimetrics.
>>>
>>> On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
>>> [email protected]> wrote:
>>>
>>>> Hello again gentlemen,
>>>>
>>>> I think Dan might have already pinged you, but just in case, I wanted
>>>> to let you know that we are getting these failures again.  It's kind
>>>> of crunch time for getting this data, so we're just banging our heads
>>>> against the wall and retrying the reports until they work (1 out of 4
>>>> times for me.)  Is there any way you all could work your magic again?
>>>>
>>>> Many thanks once again,
>>>> Amanda
>>>>
>>>>
>>>>
>>>> On Wed, Dec 10, 2014 at 4:30 PM, Kevin Leduc <[email protected]>
>>>> wrote:
>>>> > It's good to hear it's working again.  Don't hesitate to reach out to
>>>> us
>>>> > here or at [email protected] if you notice this kind of
>>>> > trouble again.
>>>> >
>>>> > On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
>>>> [email protected]>
>>>> > wrote:
>>>> >>
>>>> >> It's working perfectly now--a thousand thank yous, Dan and Marcel.
>>>> >>
>>>> >> On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
>>>> [email protected]>
>>>> >> wrote:
>>>> >>>
>>>> >>> Thanks so much Dan and Marcel!
>>>> >>>
>>>> >>> -E
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
>>>> [email protected]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> forgot Marcel - my fault.  Jaime & folks, in general Marcel rules
>>>> and
>>>> >>>> he's probably going to help you out faster / better than I can.
>>>> >>>>
>>>> >>>> On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
>>>> >>>> <[email protected]> wrote:
>>>> >>>>>
>>>> >>>>> Ok, Amanda and anyone else who had problems.  Please try again.  I
>>>> >>>>> think I've cleared up some gunk and that might have helped
>>>> things.  We'll be
>>>> >>>>> looking at performance more closely soon.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Steps taken, logging mostly for post-mortem purpose
>>>> >>>>>
>>>> >>>>> * delete from report where recurrent_parent_id is null and
>>>> recurrent =
>>>> >>>>> 0 and created < date('2014-12-01');
>>>> >>>>> ** This deleted records that are not visible in the system
>>>> anymore.
>>>> >>>>> They are recoverable from the wikimetrics database backups but we
>>>> don't need
>>>> >>>>> them in the database.  These probably slowed some things down, in
>>>> total the
>>>> >>>>> statement deleted 1623628 rows.
>>>> >>>>>
>>>> >>>>> * alter table report add column old_recurrent tinyint(1); update
>>>> report
>>>> >>>>> set recurrent = 0, old_recurrent = 1 where user_id = 461 and
>>>> recurrent = 1;
>>>> >>>>> ** This disables WikimetricsBot recurrent reports, but preserves
>>>> the
>>>> >>>>> data so we can deal with them later.  When labs is done
>>>> re-synchronizing, we
>>>> >>>>> will be re-running these reports.  They feed data to Vital Signs,
>>>> in case
>>>> >>>>> someone's curious about what they are.
>>>> >>>>>
>>>> >>>>> * Stopped and rebooted the system.  The backup system seems to be
>>>> >>>>> hanging or taking a really long time.  I'd like to take a look at
>>>> this in
>>>> >>>>> more depth, but my guess is the amount it's transferring has gone
>>>> beyond
>>>> >>>>> what we expected.
>>>> >>>>>
>>>> >>>>> On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
>>>> >>>>> <[email protected]> wrote:
>>>> >>>>>>
>>>> >>>>>> We're sorry - the problems we were facing last week have probably
>>>> >>>>>> festered.  I'm going to turn off some things and reset the
>>>> system.  I'll
>>>> >>>>>> report back.
>>>> >>>>>>
>>>> >>>>>> On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
>>>> >>>>>> <[email protected]> wrote:
>>>> >>>>>>>
>>>> >>>>>>> Oh yes, and Jaime did have me restart my browser and clear the
>>>> cache,
>>>> >>>>>>> but it did not help.
>>>> >>>>>>>
>>>> >>>>>>> Thanks again,
>>>> >>>>>>> Amanda
>>>> >>>>>>>
>>>> >>>>>>> On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
>>>> >>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Hello Kevin,
>>>> >>>>>>>>
>>>> >>>>>>>> Jaime asked me to email you about some trouble I've been
>>>> having with
>>>> >>>>>>>> Wikimetrics.  The whole team has been experiencing a pretty
>>>> high rate of
>>>> >>>>>>>> failures in both report creation and cohort uploads.  Almost
>>>> nothing has
>>>> >>>>>>>> gotten through for me today:  of the last 13 reports I've run,
>>>> 3 were
>>>> >>>>>>>> successful.  Of the failures, I would say maybe only two or
>>>> three "pended"
>>>> >>>>>>>> at all before becoming failures.  I've been experiencing the
>>>> same problem
>>>> >>>>>>>> with cohort uploads.
>>>> >>>>>>>>
>>>> >>>>>>>> The reports have been: Newly Registered, Edits, and Rolling
>>>> Active
>>>> >>>>>>>> Editor using expanded cohorts.  Please find attached an
>>>> example of one of
>>>> >>>>>>>> the reports.  I tried uploading cohorts using text files of
>>>> user names and
>>>> >>>>>>>> pasting user names from Notepad into the "Paste Usernames"
>>>> field.  I do
>>>> >>>>>>>> expand the cohorts every time.
>>>> >>>>>>>>
>>>> >>>>>>>> Do you know why the failure rate is so high, especially this
>>>> >>>>>>>> morning, and is there a way to eliminate or mitigate this
>>>> problem in the
>>>> >>>>>>>> future?
>>>> >>>>>>>>
>>>> >>>>>>>> Many thanks for the assistance, and please do let me know if
>>>> you
>>>> >>>>>>>> need any more information from me on this.
>>>> >>>>>>>>
>>>> >>>>>>>> Best,
>>>> >>>>>>>> Amanda
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Edward Galvez
>>>> >>> Program Evaluation Associate
>>>> >>> Wikimedia Foundation
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
_______________________________________________
Wikimetrics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikimetrics

Reply via email to