[Analytics] Re: Mediacounts fields

2022-11-04 Thread Neil Shah-Quinn
I believe Connie Chen and Isaac Johnson did some work on distinguishing "real images" from icons as part of the image suggestion analytics (T292316 <https://phabricator.wikimedia.org/T292316>). I don't know the details, but perhaps one of them could chime in. - Neil Shah-Q

[Analytics] Updated data in the wiki comparison tool

2022-02-07 Thread Neil Shah-Quinn
t-analyt...@wikimedia.org. [1] https://docs.google.com/spreadsheets/d/1a-UBqsYtJl6gpauJyanx0nyxuPqRvhzJRN817XpkuS8/ [2] https://www.mediawiki.org/wiki/Product_Analytics -- Neil Shah-Quinn senior data scientist, Product Analytics <https://www.mediawiki.org/wiki/Product_Analy

[Analytics] Help improve data documentation during the Wikimania hackathon!

2021-08-12 Thread Neil Shah-Quinn
to text chat) More information: https://phabricator.wikimedia.org/T288680 -- Neil Shah-Quinn senior data scientist, Product Analytics <https://www.mediawiki.org/wiki/Product_Analytics> Wikimedia Foundation <https://wikimediafoundation.org/> _

Re: [Analytics] Wiki comparison 2020 data is available

2021-02-24 Thread Neil Shah-Quinn
earch/wiki-segmentation/tree/master/data-collection> currently relies on production data access <https://wikitech.wikimedia.org/wiki/Analytics/Data_access>, which is only available to staff of the Wikimedia Foundation or Wikimedia Deutschland or official research collaborators <https://www.me

Re: [Analytics] Seeking Information Regarding Pageview Traffic

2020-12-21 Thread Neil Shah-Quinn
Can you please share any definite (or > relative) information regarding the error at that time, if possible? Can > you give me any idea on why the bot view increases so much on a certain > year (and on some certain dates)? If possible, any example will be really > helpful. > > > Anka

Re: [Analytics] Seeking Information Regarding Pageview Traffic

2020-12-18 Thread Neil Shah-Quinn
hosh_Dastider> > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Neil Shah-Quinn senior data scientist, Product Analytics <https://www.mediawiki.org/wiki

Re: [Analytics] "automated" marker added to pageview data

2020-05-13 Thread Neil Shah-Quinn
Nuria, Thank you for this update! I'm very excited about this new system. I did notice that there's not much explanation of the particular rules or strategies that are used to identify automated traffic, or a link to the implementing code. I can imagine this might be intentional, to make it

[Analytics] wmfdata users: update to version 1.0

2020-03-13 Thread Neil Shah-Quinn
t product-analyt...@wikimedia.org. On behalf of the Product Analytics team, Neil Shah-Quinn ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-19 Thread Neil Shah-Quinn
g/T245713> <https://phabricator.wikimedia.org/T245713> On Wed, 19 Feb 2020 at 13:35, Neil Shah-Quinn wrote: > Bump! > > Analytics team, I'm eager to have input from y'all about the best Spark > settings to use. > > On Fri, 14 Feb 2020 at 18:30, Neil Shah-Quinn > wrote: &g

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-19 Thread Neil Shah-Quinn
Bump! Analytics team, I'm eager to have input from y'all about the best Spark settings to use. On Fri, 14 Feb 2020 at 18:30, Neil Shah-Quinn wrote: > I ran into this problem again, and I found that neither session.stop or > newSession got rid of the error. So it's still not cle

Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Neil Shah-Quinn
Thank you very much, Luca! To make this nice documentation easier to discover, I moved it to Analytics/Systems/Clients along with the other information on the clients from Analytics/Data access. On Tue, 18 Feb 2020 at 17:11, Isaac

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-14 Thread Neil Shah-Quinn
rk.sql.html?highlight=sparksession#pyspark.sql.SparkSession.stop>?) > or maybe try to explicitly create a new session via newSession() > <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession> > ? > > On Thu, Feb

Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-10 Thread Neil Shah-Quinn
I want to echo what Nate said. We've been using this for more than a year within the Wikimedia Foundation, and it has made analyses of editing behavior much, much easier and faster, not to mention a lot less annoying. This is the product of years of expert work by the Analytics team, and they

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Neil Shah-Quinn
=sparksession#pyspark.sql.SparkSession.stop>?) >> or maybe try to explicitly create a new session via newSession() >> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession> >> ? >> >> On Thu

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-06 Thread Neil Shah-Quinn
there were two Yarn jobs running related to your notebooks, I just killed > them, let's see if it solves the problem (you might need to restart again > your notebook). If not, let's open a task and investigate :) > > Luca > > Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn <

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-05 Thread Neil Shah-Quinn
Whoa—I just got the same stopped SparkContext error on the query even after restarting the notebook, without an intermediate Java heap space error. That seems very strange to me. On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn wrote: > Hey there! > > I was running SQL queries via PySpa

[Analytics] SparkContext stopped and cannot be restarted

2020-02-05 Thread Neil Shah-Quinn
Hey there! I was running SQL queries via PySpark (using the wmfdata package ) on SWAP when one of my queries failed with "java.lang.OutofMemoryError: Java heap space". After that, when I tried to call the spark.sql function again