Ottomata has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/362310 )
Change subject: Adding "tags" column to webrequest ...................................................................... Adding "tags" column to webrequest This column will hold an array of strings we call tags. It will be populated by a UDF that understands webrequest data and can classify requests into types like "portal", "wikidata" and others. Tags are used by a job that splits webrequest into smaller subsets. Bug: T164021 Change-Id: Ie855d6b3a2d12921a4a89de3f84ec5ff5d1fe01a --- M hive/webrequest/create_webrequest_table.hql 1 file changed, 3 insertions(+), 1 deletion(-) Approvals: Ottomata: Verified; Looks good to me, approved diff --git a/hive/webrequest/create_webrequest_table.hql b/hive/webrequest/create_webrequest_table.hql index bb6c0b0..b9b8ca6 100644 --- a/hive/webrequest/create_webrequest_table.hql +++ b/hive/webrequest/create_webrequest_table.hql @@ -54,7 +54,9 @@ `normalized_host` struct<project_class: string, project:string, qualifiers: array<string>, tld: String> COMMENT 'struct containing project_class (such as wikipedia or wikidata for instance), project (such as en or commons), qualifiers (a list of in-between values, such as m and/or zero) and tld (org most often)', `pageview_info` map<string, string> COMMENT 'map containing project, language_variant and page_title values only when is_pageview = TRUE.', `page_id` bigint COMMENT 'MediaWiki page_id for this page title. For redirects this could be the page_id of the redirect or the page_id of the target. This may not always be set, even if the page is actually a pageview.', - `namespace_id` int COMMENT 'MediaWiki namespace_id for this page title. This may not always be set, even if the page is actually a pageview.' + `namespace_id` int COMMENT 'MediaWiki namespace_id for this page title. This may not always be set, even if the page is actually a pageview.', + `tags` array<string> COMMENT 'List containing tags qualifying the request, ex: ['portal', 'wikidata']. Will be used to split webrequest into smaller subsets.' + ) PARTITIONED BY ( `webrequest_source` string COMMENT 'Source cluster', -- To view, visit https://gerrit.wikimedia.org/r/362310 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie855d6b3a2d12921a4a89de3f84ec5ff5d1fe01a Gerrit-PatchSet: 5 Gerrit-Project: analytics/refinery Gerrit-Branch: master Gerrit-Owner: Nuria <nu...@wikimedia.org> Gerrit-Reviewer: Joal <j...@wikimedia.org> Gerrit-Reviewer: Mforns <mfo...@wikimedia.org> Gerrit-Reviewer: Nuria <nu...@wikimedia.org> Gerrit-Reviewer: Ottomata <ao...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits