GoranSMilovanovic added a comment.
@Lea_WMDE @RazShuty
It's definitely not Googlebot (Smartphone), I've checked the `wmf.webrequest`
for a sample:
# - wmf.webrequest dataset: parse user_agent
df = sqlContext.sql('SELECT year, month, day, hour, user_agent, agent_type,
is_pageview FROM wmf.webrequest \
WHERE (year = 2019 AND month = 3 AND day = 10 AND
hour = 1 AND \
normalized_host.project_family = "wikidata" AND
is_pageview = True)')
df.cache()
df.head(10)
results in:
[Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Windows
NT 10; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0', agent_type='user',
is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True),
Row(year=2019, month=3, day=10, hour=1, user_agent='Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)', agent_type='spider', is_pageview=True)]
so Googlebot (Smartphone) is recognized as a spider indeed.
TASK DETAIL
https://phabricator.wikimedia.org/T220977
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: RazShuty, Lea_WMDE, Aklapper, alaa_wmde, Nandana, Lahi, Gq86,
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs,
aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs