The article is for me done now. D. K.
2022-11-08 21:32 GMT+01:00, Dušan Kreheľ <[email protected]>: > [Fix]: > > A link to the source code has been added. > > @Dan Andreescu: The format is correct. The annual summary is a typical > basic statistical interval, and we save time by merging. The file size > problem disappears if the file is split by local wikis. And the skwiki > is only 49MB for the year 2021, which does not require a more > demanding level of the end user who processes them for their purpose. > > 2022-11-08 21:30 GMT+01:00, Dušan Kreheľ <[email protected]>: >> A link to the source code has been added. >> >> @Dan Andreescu: The format is correct now. The annual summary is a >> typical basic statistical interval, and we save time by merging. The >> file size problem disappears if the file is split by wík. And the >> skwiki has only 49MB for the year 2021, which does not require the >> level of the end user who processes them for their purpose. >> >> 2022-10-06 19:31 GMT+02:00, Dan Andreescu <[email protected]>: >>> @Dušan Kreheľ: I think there's a misunderstanding. I read your >>> re-written >>> article. In it, you say that the current format is: >>> >>> domain_code page_title count_views total_response_size >>> >>> For an example, you give this: >>> >>> sk Kreheľ 2 0 >>> >>> But, actually, that format is deprecated and the new format is pageviews >>> complete, which looks like this: >>> >>> sk.wikipedia Kreheľ null desktop 13 B2D2G2J2O2T1V1X1 >>> >>> The B2D2G2J2O2T1V1X1 is exactly the kind of encoding you're talking >>> about, >>> and no 0-values are present. >>> >>> You made the point that we are missing a yearly rollup in this new >>> format. >>> This would be quite a large file, but if there's a good use case for >>> such >>> a >>> dump, a request in phabricator is a good way to proceed. >>> >>> On Sat, Oct 1, 2022 at 9:58 AM Dušan Kreheľ <[email protected]> >>> wrote: >>> >>>> The big update of the article is done. Please, You look. >>>> >>>> Gergő Tisza: The current fresh hour format can remain. Later it can be >>>> converted to another format. And thus be more suitable for others. >>>> >>>> 2022-09-18 22:35 GMT+02:00, Dušan Kreheľ <[email protected]>: >>>> > I have updated the document. I added the export of human pageviews >>>> > for >>>> > year 2021. The statistics are in the article. A download link has >>>> > been >>>> > added. >>>> > >>>> > Dan Andreescu: None problem was to understand You. >>>> > >>>> > 2022-09-05 21:48 GMT+02:00, Dan Andreescu <[email protected]>: >>>> >> Hi Dušan, >>>> >> >>>> >> I added the details on pageviews_complete to the talk page on your >>>> >> proposal >>>> >> < >>>> https://en.wikipedia.org/w/index.php?title=User_talk:Du%C5%A1an_Krehe%C4%BE/Signpost_draft:New_pageview_dump_export_format_(concept)&oldid=1108690384 >>>> >. >>>> >> Please let me know if it's still confusing. >>>> >> >>>> > >>>> _______________________________________________ >>>> Wikitech-l mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >>> >> > _______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
