GoranSMilovanovic added a subscriber: Milimetric.
GoranSMilovanovic added a comment.


  @Lea_WMDE Ok, here is a direct test (Pyspark code against the 
wmf.pageviews_hourly 
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly>
 table):
  
    pw = sqlContext.sql('SELECT namespace_id, access_method, agent_type, 
SUM(view_count) AS pageviews \
                            FROM wmf.pageview_hourly\
                            WHERE  year = ' + str(d.year) + ' AND month = ' + 
str(d.month) + ' AND day = ' + str(d.day) + \
                            ' AND project = "wikidata" \
                            AND (namespace_id = 0 OR namespace_id = 120 OR 
namespace_id = 146 OR namespace_id = 640) \
                            GROUP BY namespace_id, access_method, agent_type 
ORDER BY namespace_id, access_method, agent_type')
  
  where `d` is June 13, 2019:
  
    In [31]: d
    Out[31]: datetime.datetime(2019, 6, 13, 15, 29, 14, 874165)
  
  The query results in the `pw` DataFrame:
  
    [Row(namespace_id=0, access_method='desktop', agent_type='spider', 
pageviews=3713136),
     Row(namespace_id=0, access_method='desktop', agent_type='user', 
pageviews=413537),
     Row(namespace_id=0, access_method='mobile web', agent_type='spider', 
pageviews=408138),
     Row(namespace_id=0, access_method='mobile web', agent_type='user', 
pageviews=115864),
     Row(namespace_id=120, access_method='desktop', agent_type='spider', 
pageviews=7084),
     Row(namespace_id=120, access_method='desktop', agent_type='user', 
pageviews=11586),
     Row(namespace_id=120, access_method='mobile web', agent_type='spider', 
pageviews=1418),
     Row(namespace_id=120, access_method='mobile web', agent_type='user', 
pageviews=3193),
     Row(namespace_id=146, access_method='desktop', agent_type='spider', 
pageviews=938),
     Row(namespace_id=146, access_method='desktop', agent_type='user', 
pageviews=179),
     Row(namespace_id=146, access_method='mobile web', agent_type='spider', 
pageviews=167),
     Row(namespace_id=146, access_method='mobile web', agent_type='user', 
pageviews=8),
     Row(namespace_id=640, access_method='desktop', agent_type='spider', 
pageviews=1086),
     Row(namespace_id=640, access_method='desktop', agent_type='user', 
pageviews=133),
     Row(namespace_id=640, access_method='mobile web', agent_type='spider', 
pageviews=3)]
  
  which matches exactly what we get for June 13, 2019 from our new Dashboard 
<http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/>.
  
  Moreover, let's have a look at the total number of pageviews for `user` (i.e. 
`spiders` are excluded like in Wikistats2) for June 13, 2019:
  
    pw = sqlContext.sql('SELECT SUM(view_count) AS pageviews \
                                FROM wmf.pageview_hourly\
                                WHERE  year = ' + str(d.year) + ' AND month = ' 
+ str(d.month) + ' AND day = ' + str(d.day) + \
                            ' AND project = "wikidata" \
                            AND agent_type = "user"')
  
  results in
  
    Row(pageviews=1420740)
  
  which is far bellow the number reported on Wikistats2 for June 13, 2019, 
which is: `5,764,558`.
  
  @Milimetric I am looking at the pageviews data from Wikidata for June 13, 
2019, at: 
https://stats.wikimedia.org/v2/#/wikidata.org/reading/total-page-views/normal|bar|1-month|~total|daily
 and I can't seem to be able to reproduce it. Could you let me know what could 
be the possible source of differences? Thank you.

TASK DETAIL
  https://phabricator.wikimedia.org/T208567

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Milimetric, GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE, 
darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to