[Wikitech-l] MediaWiki API page move/redirect data

2020-03-23 Thread James Gardner via Wikitech-l
Hi all,

We're still working on a project with the MediaWiki API, and we've ran into
a different issue regarding page moves/redirects.

We're trying to pull revision and redirect data from the "Killing/Death of
Luo Changqing" page and talk page. Unfortunately, this page wasn't found
when pulling it through the MediaWiki API when we filtered using our date
range from 2009-2019. Either Death or Killing worked prior to the page
move, but now we found that we can no longer access the revisions that
occurred during the old time frame.

Regarding pages that have been moved/redirected, what would you recommend
us to do pull this data that was previously available?

Thanks,

Jackie, James, Junyi, Kirby
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] MediaWiki API pageview issue

2020-02-24 Thread James Gardner via Wikitech-l
Thanks for the clarification of how redirects work, and what we should keep
in mind when trying to count pageviews. Do you know if there's a way to
find the date(s) when a page is redirected using the API? We know we can
get the 'old' page ids of redirected pages using the API, but we're not
sure if using the creation date of these page ids would be accurate. Also,
what's the difference between redirects and page moves if there is one?

We may stick to including redirects without trying to avoid overcounting as
this appears to be a more complicated issue that we thought. We are working
to collect pageviews within a specific time frame, so relative dates isn't
quite what we're looking for.

Thanks again!

Jackie, James, Junyi, Kirby

On Mon, Feb 24, 2020 at 10:52 AM MusikAnimal  wrote:

> > We attempted to use the wmflabs.org tool, but it only shows data from a
> certain date
>
> I'm assuming you want relative dates, not exact dates? You can do this by
> using the range=latest-N URL parameter (where N is the number of days). See
> <https://tools.wmflabs.org/pageviews/url_structure/> and <
> https://tools.wmflabs.org/redirectviews/url_structure/> for Redirect
> Views. This mirrors the pvipdays parameter of the action API.
>
> I'm sorry there is no backend for these tools, so if you need automation
> you'll have to scrape it or re-implement it's logic yourself.
>
> > In the end, we are trying to get an accurate count of view for a certain
> page no matter the source.
>
> Keep in mind that redirects can change, and historically may have not been
> the "same" page. For instance, if I create the article Foo, and someone
> else creates Bar, and some months later Foo is redirected to Bar. To
> accurately get the views of just Bar, you'll need to somehow exclude the
> time when Foo was a different article. Page moves can also cause unexpected
> results (Foo is moved to Baz, Bar is moved to Foo, etc.). Finally, page IDs
> can change too, say if I delete Foo, then move Bar to Foo. There isn't a
> foolproof solution, it seems, but simply including redirects is usually
> enough to give you what you want.
>
> ~ MA
>
> On Mon, Feb 24, 2020 at 9:18 AM James Gardner via Wikitech-l <
> wikitech-l@lists.wikimedia.org> wrote:
>
>> Hi all,
>>
>> Thanks for all the help and advice with this issue, especially with the
>> wmflabs tool with the redirect view tool. We'll try using that tool to
>> download the pageview data we need and manually filter by dates to map
>> redirects to the page. We'll also look into the REST API that Wiki has to
>> see if it can help us as well.
>>
>> Thanks again,
>>
>> Jackie, James, Junyi, Kirby
>>
>>
>> On Sun, Feb 23, 2020 at 10:58 PM Gergo Tisza 
>> wrote:
>>
>> > On Sun, Feb 23, 2020 at 4:17 PM James Gardner via Wikitech-l <
>> > wikitech-l@lists.wikimedia.org> wrote:
>> >
>> >> We attempted to use the wmflabs.org tool, but it only shows data from
>> a
>> >> certain date. (Example link:
>> >>
>> >>
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests|China
>> <https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests%7CChina>
>> >> <
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests%7CChina
>> >
>> >> <
>> >>
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests%7CChina
>> >> >
>> >> )
>> >>
>> >
>> > There's a redirectview tool (see the "redirects" links at the bottom of
>> > the page you linked) but it can't be filtered by date so it probably
>> can't
>> > help you.
>> >
>> >
>> >> Then we attempted to use the redirects of a page and using the old page
>> >> ids
>> >> to grab the pageview data, but there was no data returned. When we
>> >> attempted to grab data for a page that we knew would have a long past,
>> but
>> >> the parameter of "pvipcontinue" did not appear (
>> >>
>> https://www.mediawiki.org/w/api.php?action=help=query%2Bpageviews
>> >> ).
>> >> (Example:
>> >>
>> >>
>> https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query=json=pageviews=MediaWiki=pageviews=60=
>> >> )
>> >>
>&g

Re: [Wikitech-l] MediaWiki API pageview issue

2020-02-24 Thread James Gardner via Wikitech-l
Hi all,

Thanks for all the help and advice with this issue, especially with the
wmflabs tool with the redirect view tool. We'll try using that tool to
download the pageview data we need and manually filter by dates to map
redirects to the page. We'll also look into the REST API that Wiki has to
see if it can help us as well.

Thanks again,

Jackie, James, Junyi, Kirby


On Sun, Feb 23, 2020 at 10:58 PM Gergo Tisza  wrote:

> On Sun, Feb 23, 2020 at 4:17 PM James Gardner via Wikitech-l <
> wikitech-l@lists.wikimedia.org> wrote:
>
>> We attempted to use the wmflabs.org tool, but it only shows data from a
>> certain date. (Example link:
>>
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests|China
>> <https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests%7CChina>
>> <
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests%7CChina
>> >
>> )
>>
>
> There's a redirectview tool (see the "redirects" links at the bottom of
> the page you linked) but it can't be filtered by date so it probably can't
> help you.
>
>
>> Then we attempted to use the redirects of a page and using the old page
>> ids
>> to grab the pageview data, but there was no data returned. When we
>> attempted to grab data for a page that we knew would have a long past, but
>> the parameter of "pvipcontinue" did not appear (
>> https://www.mediawiki.org/w/api.php?action=help=query%2Bpageviews
>> ).
>> (Example:
>>
>> https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query=json=pageviews=MediaWiki=pageviews=60=
>> )
>>
>
> That API displays a limited set of metrics and is focused on caching and
> being backend-agnostic. There is no way to get old data, pvicontinue is for
> fetching data about more pages. If you need something more specific, you
> should use the Analytics Query Service (which the other APIs rely on)
> directly: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
>
> I think you'll have to piece the data together using the MediaWiki
> redirects API and AQS.
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] MediaWiki API pageview issue

2020-02-23 Thread James Gardner via Wikitech-l
Hi all,

We are a group of undergraduates working on a project using the MediaWiki
API. While working on this project, we ran into a unique issue involving
pageviews. When trying to pull pageview data for a particular page, the
redirects of a page would not be counted along with the original pageviews.
For example, the Hong Kong protests page only has direct views, and not
views from previous titles.

We attempted to use the wmflabs.org tool, but it only shows data from a
certain date. (Example link:
https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=2019-07-01=2020-01-25=2019%E2%80%9320_Hong_Kong_protests|China

)

Then we attempted to use the redirects of a page and using the old page ids
to grab the pageview data, but there was no data returned. When we
attempted to grab data for a page that we knew would have a long past, but
the parameter of "pvipcontinue" did not appear (
https://www.mediawiki.org/w/api.php?action=help=query%2Bpageviews).
(Example:
https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query=json=pageviews=MediaWiki=pageviews=60=
)

In the end, we are trying to get an accurate count of view for a certain
page no matter the source.

Any guidance or assistance is greatly appreciated.

Thanks,
Jackie, James, Junyi, Kirby
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l