chelsyx added a comment.
First draft of the report: F4420629: report.pdf
I put a lot of stuff into report. However, because of my lack of domain knowledge, I don't have a very clear idea about what question is meaningful/useful to answer. So any suggestion is very welcome!!!TASK DETAILhttps
chelsyx added a comment.
Second Draft: F4452046: report.pdfTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas
chelsyx added a comment.
Updated Reviewers: F4537643: report.pdf
@debt and @Smalyshev, your suggestions are very welcome!!! :)TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Addshore, Aklapper
chelsyx added a comment.
Thank you @debt! :)TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight
chelsyx added a comment.
Modified: F4553759: report.pdf
@debt Please let me know if there is anything else need to be changed.TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Addshore, Aklapper
chelsyx added a comment.
Thanks @debt! Updated on Commons!TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas
chelsyx added a comment.
3rd draft: F4487819: report.pdfTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas
chelsyx added a comment.
Thanks everyone! I've uploaded the report to the commons: https://commons.wikimedia.org/wiki/File:Exploration_on_the_Use_of_WDQS_-_Breakdown_by_Geography,_User_Agent_and_Referer_Class.pdfTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps
chelsyx added a comment.
@Smalyshev what do you mean by "error responses"?
Here is an example of my query:
SELECT CONCAT(year,'-',month,'-',day) AS dt,
PERCENTILE_APPROX(time_firstbyte, 0.5) AS median_time_firstbyte,
PERCENTILE(response_size, 0.5) AS median_response_size
FROM webreq
chelsyx added a comment.
Updated: On Oct 12, 2017, the number of files uploaded by bots is 9,390,721 (22.03%), and the number of files uploaded by users is 33,241,541 (77.97%). The following table break down the counts by media type:
Media TypeUser GroupNumber of FilesProportion
chelsyx added a comment.
The following two graphs breakdown the number by month:
F10169825: nfile_bot_month.png
F10169827: nfile_bot_month_prop.pngTASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc
chelsyx updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...* [x] individuals
* [x] mass-tools/institutions
* [x] number of contributions as of present time
* [x] compare to what it looked like 30 days agoTASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL
chelsyx added a comment.
Codebase and output: https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177354TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Liuxinyu970226, Aklapper
chelsyx added a comment.
@mpopov yup, I will put my stuff in the repo.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt
chelsyx claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic
chelsyx moved this task from Needs triage to Current work on the Discovery-Analysis board.chelsyx edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T177354WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850
chelsyx added a comment.
@mpopov Looks like the file type categorization on commons is messier than we thought...
For example, File:Krazy_Kat_Bugolist_1916_silent.ogv is an ogv file, but its img_minor_mime is ogg, img_major_mime is application, and img_media_type is video. This is the same
chelsyx added a comment.
Hey @chelsyx - what time frame does this cover?
Jumping in to say this looks like it's from launch of Commons to now.
Thanks @mpopov ! Yes, this is the file counts on Oct 10.
Can we also get a count of how this has changed over the last week and compare that to the last
chelsyx moved this task from In progress to Needs review on the Discovery-Analysis (Current work) board.chelsyx added a comment.
The number of files uploaded by bots is 9,390,408 (22.04%), and the number of files uploaded by users is 33,222,828 (77.96%). The following table break down the counts
chelsyx claimed this task.chelsyx edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx
chelsyx added a comment.
Hi @Nuria , the numbers I showed above are cumulative sum at the end of each month, while the numbers you talked about are newly uploads for each month. From my query, for Dec 2016, the number of newly uploaded files by bots are 392,566, by users = 392,786. This is closed
chelsyx claimed this task.chelsyx moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.
TASK DETAILhttps://phabricator.wikimedia.org/T177358WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org
chelsyx moved this task from In progress to Needs review on the Discovery-Analysis (Current work) board.chelsyx added a comment.
We computed several search metrics with event logging data in November 2017, and compare them with English Wikipedia. They are searches on desktop only, since we have
chelsyx edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: EBernhardson, Aklapper, mpopov, chelsyx, Abit
chelsyx added a comment.
Categorization
Excluding hidden categories and 'needing_category' categories, there are 1,629,592 (3.73%) files that don't belong to any category, 22,492,880 (51.55%) files belong to only 1 category as of December 12, 2017.
F11832678: nfile_by_categories.png
Breakdown
chelsyx added a comment.
We parsed the wikitext of all files in Commons xml data dumps of November 20, 2017, and extract the language templates in them (e.g. {{en}}, {{LangSwitch}}). Out of the total 43,268,565 files, 14,848,551 (34.32%) files don't have any language templates, 23,780,247 (54.96
chelsyx updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...* [x] how many files/descriptions are in multiple languages?...** [x] How many files are in lang X?
** [x] How many have multiple languages in them?
** [x] How many Western industrialized languages?...TASK
chelsyx changed the status of subtask T177353: Metrics for SDoC: look at search hits based on which element the search is hitting from "Stalled" to "Open".
TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pa
chelsyx changed the task status from "Stalled" to "Open".chelsyx raised the priority of this task from "Low" to "Normal".chelsyx added a comment.
We parsed the wikitext of all files in Commons xml data dumps of November 20, 2017. Out of the total 43,268,
chelsyx added a comment.
Hello @thiemowmde ! The purpose of T177353 and its parent ticket T174519: [epic] SDoC: Determine baseline for metrics is to figure out a baseline for metrics on Commons in order to measure future successes for the #structured-data-commons (SDoC) project. The SDoC team
chelsyx added a subtask: T182849: Identify unhelpful file names on commons.
TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: PDrouin-WMF, EBernhardson, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF
chelsyx triaged this task as "Low" priority.
TASK DETAILhttps://phabricator.wikimedia.org/T182849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: thiemowmde, Aklapper, Abit, Ramsey-WMF, mpopov, chelsyx, Lahi, PDrouin-WMF, Gq86, E1
chelsyx added a parent task: T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.
TASK DETAILhttps://phabricator.wikimedia.org/T182849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: thiemowmde, Aklapper, Abit
chelsyx moved this task from In progress to Needs review on the Discovery-Analysis (Current work) board.chelsyx added a comment.
All results and analysis codebase can be found here: https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177353
For unhelpful file names, I created
chelsyx claimed this task.chelsyx moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.
TASK DETAILhttps://phabricator.wikimedia.org/T179450WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org
chelsyx created this task.chelsyx added projects: Structured-Data-Commons, Discovery-Analysis.Herald added a subscriber: Aklapper.Herald added a project: Wikidata.
TASK DESCRIPTIONIn T177353, we were asked to get a count of files with unhelpful names. To identify unhelpful file names, we can
chelsyx claimed this task.chelsyx edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T177534EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: mpopov, chelsyx, debt
chelsyx added a comment.
Status of tasks of this ticket:
Search hits based on which element the search is hitting: file name vs. description vs. category
This is not feasible currently. Possible solution is T177353#3716344, and we will need help from search backend team.
"Unfindable&qu
chelsyx added a comment.
On November 7, the number of files having a "needing categories" category is 4,268,386 (10%). The following table break down the counts by media type:
img_media_typeneed_catn_filesproportion
bitmapno3617694184.47%
bitmapyes42072329.82%
drawingno1
chelsyx added a comment.
There are 142,994 files with annotations (ImageNote), follow this link for the most current count.
The revision history of annotations are there, along with other page revision history, for example: https://commons.wikimedia.org/w/index.php?title
chelsyx updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...** After talking with @EBernhardson , we decided this is not feasible since we don't record this information nowTASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps
chelsyx updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...* [x] investigate file annotations and if any tracking (logging) of them are available...TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel
chelsyx added a comment.
Good idea! Thanks @Nuria !TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt
chelsyx added a subscriber: EBernhardson.chelsyx updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...* [x] file name vs. description vs. category
* [] "Unfindable" images metrics
* []After talking with @EBernhardson , we decided this is not feasible since we do
chelsyx added a comment.
In T177353#3716995, @debt wrote:
Great idea, @EBernhardson, let's do it! @chelsyx can you get that sampling from the data we already have?
@debt Yes, I can get those queries from TestSearchSatisfaction2 table. We will need help from @EBernhardson to run them against
chelsyx added a comment.
In T177353#3714007, @debt wrote:
Oh, that looks like that will be quite interesting, @chelsyx, although it looks like it might be a bit of manual work involved.
Getting data from the move log is easy, but it will take some time to train and adjust the model. @debt
chelsyx claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T182849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Base, Liuxinyu970226, thiemowmde, Aklapper, Abit, Ramsey-WMF, mpopov, chelsyx, Lahi, PDrouin-WMF, Gq86, E1presidente
chelsyx added a comment.
For unhelpful file names, I want to extract the old and new file names from the move log whose change reason is meaningless or ambiguous, and then train a model to classify these file names. As far as I know, short text classification like this is a bit tricky.. @mpopov do
chelsyx moved this task from In progress to Needs review on the Discovery-Analysis (Current work) board.chelsyx added a comment.
Done: https://meta.wikimedia.org/wiki/Research:Baseline_Metrics_for_Structured_Data_on_Wikimedia_CommonsTASK DETAILhttps://phabricator.wikimedia.org
chelsyx added a comment.
@Ramsey-WMF Is there any feedback about the baseline metrics from the team? Could we resolve this ticket and other child tickets?TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences
chelsyx moved this task from Needs review to Done on the Discovery-Analysis (Current work) board.chelsyx added a comment.
Thank you @Ramsey-WMF ! :DTASK DETAILhttps://phabricator.wikimedia.org/T174519WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps
chelsyx closed this task as "Resolved".chelsyx claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, Liuxinyu970226, Capt_Swing, Ramsey-WMF, SandraF_WMF, Abit, chels
chelsyx closed subtask T177353: Metrics for SDoC: look at search hits based on which element the search is hitting as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsy
chelsyx closed this task as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: PDrouin-WMF, EBernhardson, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing,
chelsyx closed subtask T179450: Documentation of SDoC findings as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, Liuxinyu970226, Capt_Swing, Ramsey-WMF, SandraF
chelsyx closed this task as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T179450EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Keegan, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Capt_Swing, Liuxinyu970226, debt, Nuria,
chelsyx closed this task as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T177534EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: mpopov, chelsyx, debt, Aklapper, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Dar
chelsyx closed subtask T177534: Search Metrics for SDoC: eventlogging as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, Liuxinyu970226, Capt_Swing, Ramsey-WMF, S
chelsyx added a project: Analytics.
TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Akovalyov, Lahi, Gq86
chelsyx added a comment.
Hi @Smalyshev , the dashboard is updating. But since August 10th, the SPARQL usage number is very small (even 0 for certain days) and the LDF usage number is 0. Did we change the URI of the endpoint?
Query:
sql
SELECT
year, month, day,
IF(uri_path = '/sparql
chelsyx added a subscriber: Nuria.chelsyx added a comment.
Hi @Nuria we noticed that since August 10th, the SPARQL usage number is very small (see query in T204415#4590108), which is much less than what we saw in logstash: https://logstash.wikimedia.org/goto/74e376f55fcdc3b93e4a7232cfa5203a
Do you
chelsyx added a comment.
A first try using logistic regression: https://paws-public.wmflabs.org/paws-public/User:CXie_(WMF)/commons_file_names.ipynbTASK DETAILhttps://phabricator.wikimedia.org/T182849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc
chelsyx closed subtask T203723: As a product analyst I would like to know how
people are using the Wikidata Descriptions editing features as
Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T193691
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
63 matches
Mail list logo