[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-12-14 Thread chelsyx
chelsyx added a comment. Categorization Excluding hidden categories and 'needing_category' categories, there are 1,629,592 (3.73%) files that don't belong to any category, 22,492,880 (51.55%) files belong to only 1 category as of December 12, 2017. F11832678: nfile_by_categories.png Breakdown by

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-11-07 Thread chelsyx
chelsyx added a comment. Status of tasks of this ticket: Search hits based on which element the search is hitting: file name vs. description vs. category This is not feasible currently. Possible solution is T177353#3716344, and we will need help from search backend team. "Unfindable" images

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-11-07 Thread chelsyx
chelsyx added a comment. On November 7, the number of files having a "needing categories" category is 4,268,386 (10%). The following table break down the counts by media type: img_media_typeneed_catn_filesproportion bitmapno3617694184.47% bitmapyes42072329.82% drawingno11673892.73%

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-30 Thread chelsyx
chelsyx added a comment. In T177353#3714007, @debt wrote: Oh, that looks like that will be quite interesting, @chelsyx, although it looks like it might be a bit of manual work involved. Getting data from the move log is easy, but it will take some time to train and adjust the model. @debt

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-30 Thread chelsyx
chelsyx added a comment. In T177353#3716995, @debt wrote: Great idea, @EBernhardson, let's do it! @chelsyx can you get that sampling from the data we already have? @debt Yes, I can get those queries from TestSearchSatisfaction2 table. We will need help from @EBernhardson to run them against

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-27 Thread debt
debt added a comment. Great idea, @EBernhardson, let's do it! @chelsyx can you get that sampling from the data we already have?TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, debtCc: EBernhardson,

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-27 Thread EBernhardson
EBernhardson added a comment. While we don't log it, we could certainly take a sampling of say 20k queries, run them against our test cluster, and poke at the results to see which parts triggered the hit.TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-27 Thread Ramsey-WMF
Ramsey-WMF added a comment. In T177353#3711572, @chelsyx wrote: There are 142,994 files with annotations (ImageNote), follow this link for the most current count. The revision history of annotations are there, along with other page revision history, for example:

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-26 Thread debt
debt added a comment. Oh, that looks like that will be quite interesting, @chelsyx, although it looks like it might be a bit of manual work involved.TASK DETAILhttps://phabricator.wikimedia.org/T177353EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx,

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-26 Thread chelsyx
chelsyx added a comment. For unhelpful file names, I want to extract the old and new file names from the move log whose change reason is meaningless or ambiguous, and then train a model to classify these file names. As far as I know, short text classification like this is a bit tricky.. @mpopov do

[Wikidata-bugs] [Maniphest] [Commented On] T177353: Metrics for SDoC: look at search hits based on which element the search is hitting

2017-10-25 Thread chelsyx
chelsyx added a comment. There are 142,994 files with annotations (ImageNote), follow this link for the most current count. The revision history of annotations are there, along with other page revision history, for example: