Wikimetrics has been having serious connectivity problems for a few days. It turned out to be solvable by using some new hostnames ( labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports and let me know if anything is still wrong.
On Fri, Jan 23, 2015 at 10:46 AM, Dan Andreescu <[email protected]> wrote: > Hi everyone. I will work on this as soon as I get into the office, in > about an hour from now. Yuvi suggested one thing that I wasn't aware of > that might make this a simple fix. > > > On Friday, January 23, 2015, Dan Higgins <[email protected]> wrote: > >> Hi Kevin, >> >> Sorry to be a pest but do you have any update on sorting out the >> Wikimetrics issues? It seems to have gotten worse since we last spoke to >> you with around 1 in 10 reports going through. >> >> Thanks, >> >> Dan >> >> On Tue, Jan 20, 2015 at 7:17 PM, Kevin Leduc <[email protected]> wrote: >> >>> All the developers are in transit to SF today. Dan said he'd be in the >>> office this afternoon. First dev I see I'll notify them of problems in >>> wikimetrics. >>> >>> On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker < >>> [email protected]> wrote: >>> >>>> Hello again gentlemen, >>>> >>>> I think Dan might have already pinged you, but just in case, I wanted >>>> to let you know that we are getting these failures again. It's kind >>>> of crunch time for getting this data, so we're just banging our heads >>>> against the wall and retrying the reports until they work (1 out of 4 >>>> times for me.) Is there any way you all could work your magic again? >>>> >>>> Many thanks once again, >>>> Amanda >>>> >>>> >>>> >>>> On Wed, Dec 10, 2014 at 4:30 PM, Kevin Leduc <[email protected]> >>>> wrote: >>>> > It's good to hear it's working again. Don't hesitate to reach out to >>>> us >>>> > here or at [email protected] if you notice this kind of >>>> > trouble again. >>>> > >>>> > On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker < >>>> [email protected]> >>>> > wrote: >>>> >> >>>> >> It's working perfectly now--a thousand thank yous, Dan and Marcel. >>>> >> >>>> >> On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez < >>>> [email protected]> >>>> >> wrote: >>>> >>> >>>> >>> Thanks so much Dan and Marcel! >>>> >>> >>>> >>> -E >>>> >>> >>>> >>> >>>> >>> On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu < >>>> [email protected]> >>>> >>> wrote: >>>> >>>> >>>> >>>> forgot Marcel - my fault. Jaime & folks, in general Marcel rules >>>> and >>>> >>>> he's probably going to help you out faster / better than I can. >>>> >>>> >>>> >>>> On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu >>>> >>>> <[email protected]> wrote: >>>> >>>>> >>>> >>>>> Ok, Amanda and anyone else who had problems. Please try again. I >>>> >>>>> think I've cleared up some gunk and that might have helped >>>> things. We'll be >>>> >>>>> looking at performance more closely soon. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> Steps taken, logging mostly for post-mortem purpose >>>> >>>>> >>>> >>>>> * delete from report where recurrent_parent_id is null and >>>> recurrent = >>>> >>>>> 0 and created < date('2014-12-01'); >>>> >>>>> ** This deleted records that are not visible in the system >>>> anymore. >>>> >>>>> They are recoverable from the wikimetrics database backups but we >>>> don't need >>>> >>>>> them in the database. These probably slowed some things down, in >>>> total the >>>> >>>>> statement deleted 1623628 rows. >>>> >>>>> >>>> >>>>> * alter table report add column old_recurrent tinyint(1); update >>>> report >>>> >>>>> set recurrent = 0, old_recurrent = 1 where user_id = 461 and >>>> recurrent = 1; >>>> >>>>> ** This disables WikimetricsBot recurrent reports, but preserves >>>> the >>>> >>>>> data so we can deal with them later. When labs is done >>>> re-synchronizing, we >>>> >>>>> will be re-running these reports. They feed data to Vital Signs, >>>> in case >>>> >>>>> someone's curious about what they are. >>>> >>>>> >>>> >>>>> * Stopped and rebooted the system. The backup system seems to be >>>> >>>>> hanging or taking a really long time. I'd like to take a look at >>>> this in >>>> >>>>> more depth, but my guess is the amount it's transferring has gone >>>> beyond >>>> >>>>> what we expected. >>>> >>>>> >>>> >>>>> On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu >>>> >>>>> <[email protected]> wrote: >>>> >>>>>> >>>> >>>>>> We're sorry - the problems we were facing last week have probably >>>> >>>>>> festered. I'm going to turn off some things and reset the >>>> system. I'll >>>> >>>>>> report back. >>>> >>>>>> >>>> >>>>>> On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker >>>> >>>>>> <[email protected]> wrote: >>>> >>>>>>> >>>> >>>>>>> Oh yes, and Jaime did have me restart my browser and clear the >>>> cache, >>>> >>>>>>> but it did not help. >>>> >>>>>>> >>>> >>>>>>> Thanks again, >>>> >>>>>>> Amanda >>>> >>>>>>> >>>> >>>>>>> On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker >>>> >>>>>>> <[email protected]> wrote: >>>> >>>>>>>> >>>> >>>>>>>> Hello Kevin, >>>> >>>>>>>> >>>> >>>>>>>> Jaime asked me to email you about some trouble I've been >>>> having with >>>> >>>>>>>> Wikimetrics. The whole team has been experiencing a pretty >>>> high rate of >>>> >>>>>>>> failures in both report creation and cohort uploads. Almost >>>> nothing has >>>> >>>>>>>> gotten through for me today: of the last 13 reports I've run, >>>> 3 were >>>> >>>>>>>> successful. Of the failures, I would say maybe only two or >>>> three "pended" >>>> >>>>>>>> at all before becoming failures. I've been experiencing the >>>> same problem >>>> >>>>>>>> with cohort uploads. >>>> >>>>>>>> >>>> >>>>>>>> The reports have been: Newly Registered, Edits, and Rolling >>>> Active >>>> >>>>>>>> Editor using expanded cohorts. Please find attached an >>>> example of one of >>>> >>>>>>>> the reports. I tried uploading cohorts using text files of >>>> user names and >>>> >>>>>>>> pasting user names from Notepad into the "Paste Usernames" >>>> field. I do >>>> >>>>>>>> expand the cohorts every time. >>>> >>>>>>>> >>>> >>>>>>>> Do you know why the failure rate is so high, especially this >>>> >>>>>>>> morning, and is there a way to eliminate or mitigate this >>>> problem in the >>>> >>>>>>>> future? >>>> >>>>>>>> >>>> >>>>>>>> Many thanks for the assistance, and please do let me know if >>>> you >>>> >>>>>>>> need any more information from me on this. >>>> >>>>>>>> >>>> >>>>>>>> Best, >>>> >>>>>>>> Amanda >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> Edward Galvez >>>> >>> Program Evaluation Associate >>>> >>> Wikimedia Foundation >>>> >> >>>> >> >>>> > >>>> >>> >>> >>
_______________________________________________ Wikimetrics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikimetrics
