Ottomata has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/362310 )

Change subject: Adding "tags" column to webrequest
......................................................................


Adding "tags" column to webrequest

This column will hold an array of strings we call tags.
It will be populated by a UDF
that understands webrequest data and can classify
requests into types like "portal", "wikidata" and others.

Tags are used by a job that splits webrequest into
smaller subsets.

Bug: T164021
Change-Id: Ie855d6b3a2d12921a4a89de3f84ec5ff5d1fe01a
---
M hive/webrequest/create_webrequest_table.hql
1 file changed, 3 insertions(+), 1 deletion(-)

Approvals:
  Ottomata: Verified; Looks good to me, approved



diff --git a/hive/webrequest/create_webrequest_table.hql 
b/hive/webrequest/create_webrequest_table.hql
index bb6c0b0..b9b8ca6 100644
--- a/hive/webrequest/create_webrequest_table.hql
+++ b/hive/webrequest/create_webrequest_table.hql
@@ -54,7 +54,9 @@
     `normalized_host`   struct<project_class: string, project:string, 
qualifiers: array<string>, tld: String>  COMMENT 'struct containing 
project_class (such as wikipedia or wikidata for instance), project (such as en 
or commons), qualifiers (a list of in-between values, such as m and/or zero) 
and tld (org most often)',
     `pageview_info`     map<string, string>  COMMENT 'map containing project, 
language_variant and page_title values only when is_pageview = TRUE.',
     `page_id`           bigint  COMMENT 'MediaWiki page_id for this page 
title. For redirects this could be the page_id of the redirect or the page_id 
of the target. This may not always be set, even if the page is actually a 
pageview.',
-    `namespace_id`      int     COMMENT 'MediaWiki namespace_id for this page 
title. This may not always be set, even if the page is actually a pageview.'
+    `namespace_id`      int     COMMENT 'MediaWiki namespace_id for this page 
title. This may not always be set, even if the page is actually a pageview.',
+    `tags`              array<string> COMMENT 'List containing tags qualifying 
the request, ex: ['portal', 'wikidata']. Will be used to split webrequest into 
smaller subsets.'
+
 )
 PARTITIONED BY (
     `webrequest_source` string  COMMENT 'Source cluster',

-- 
To view, visit https://gerrit.wikimedia.org/r/362310
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie855d6b3a2d12921a4a89de3f84ec5ff5d1fe01a
Gerrit-PatchSet: 5
Gerrit-Project: analytics/refinery
Gerrit-Branch: master
Gerrit-Owner: Nuria <nu...@wikimedia.org>
Gerrit-Reviewer: Joal <j...@wikimedia.org>
Gerrit-Reviewer: Mforns <mfo...@wikimedia.org>
Gerrit-Reviewer: Nuria <nu...@wikimedia.org>
Gerrit-Reviewer: Ottomata <ao...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to