Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
This bug has been fixed, see https://bugzilla.wikimedia.org/show_bug.cgi?id=45178 I will post a message on the Village Pump as well. Best, Diederik On Sun, Feb 3, 2013 at 3:44 PM, Brad Jorsch bjor...@wikimedia.org wrote: On Fri, Jan 25, 2013 at 12:51 PM, Diederik van Liere dvanli...@wikimedia.org wrote: No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/ will stay the same. It seems that page names are coming through with spaces now, where they didn't before. See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Format_Change_of_Page_View_Stats ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
On Fri, Jan 25, 2013 at 12:51 PM, Diederik van Liere dvanli...@wikimedia.org wrote: No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/ will stay the same. It seems that page names are coming through with spaces now, where they didn't before. See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Format_Change_of_Page_View_Stats ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
Apologies for crossposting Heya, The Analytics Team is planning to deploy tab as field delimiter to replace the current space as fielddelimiter on the varnish/squid/nginx servers. We would like to do this on February 1st. The reason for this change is that we need to have a consistent number of fields in each webrequest log line. Right now, some fields contain spaces and that require a lot of post-processing cleanup and slows down the generation of reports. What is affected and maintained by Analytics * udp-filter already has support for the tab character * webstatscollector: we compiled a new version of filter to add support for the tab character * wikistats: we will fix the scripts on an ongoing basis. * udp2log: we have a patch ready for inserting sequence numbers separated by tab. In particular, I would like to have feedback to three questions: 1) Are there important reasons not to use tab as field delimiter? 2) Are there important pieces of logging that expect a space instead of a tab and that need to be fixed and that I did not mention in this email? 3) Is February 1st a good date to deploy this change? (Assuming that all preps are finished) Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
Just to clarify, will this affect the stats at http://dumps.wikimedia.org/other/pagecounts-raw/ ? Changing the format of that will probably break third party scripts. -- -bawolff On Fri, Jan 25, 2013 at 1:41 PM, Diederik van Liere dvanli...@wikimedia.org wrote: Apologies for crossposting Heya, The Analytics Team is planning to deploy tab as field delimiter to replace the current space as fielddelimiter on the varnish/squid/nginx servers. We would like to do this on February 1st. The reason for this change is that we need to have a consistent number of fields in each webrequest log line. Right now, some fields contain spaces and that require a lot of post-processing cleanup and slows down the generation of reports. What is affected and maintained by Analytics * udp-filter already has support for the tab character * webstatscollector: we compiled a new version of filter to add support for the tab character * wikistats: we will fix the scripts on an ongoing basis. * udp2log: we have a patch ready for inserting sequence numbers separated by tab. In particular, I would like to have feedback to three questions: 1) Are there important reasons not to use tab as field delimiter? 2) Are there important pieces of logging that expect a space instead of a tab and that need to be fixed and that I did not mention in this email? 3) Is February 1st a good date to deploy this change? (Assuming that all preps are finished) Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/ will stay the same. Best, Diederik On Fri, Jan 25, 2013 at 12:48 PM, bawolff bawolff...@gmail.com wrote: Just to clarify, will this affect the stats at http://dumps.wikimedia.org/other/pagecounts-raw/ ? Changing the format of that will probably break third party scripts. -- -bawolff On Fri, Jan 25, 2013 at 1:41 PM, Diederik van Liere dvanli...@wikimedia.org wrote: Apologies for crossposting Heya, The Analytics Team is planning to deploy tab as field delimiter to replace the current space as fielddelimiter on the varnish/squid/nginx servers. We would like to do this on February 1st. The reason for this change is that we need to have a consistent number of fields in each webrequest log line. Right now, some fields contain spaces and that require a lot of post-processing cleanup and slows down the generation of reports. What is affected and maintained by Analytics * udp-filter already has support for the tab character * webstatscollector: we compiled a new version of filter to add support for the tab character * wikistats: we will fix the scripts on an ongoing basis. * udp2log: we have a patch ready for inserting sequence numbers separated by tab. In particular, I would like to have feedback to three questions: 1) Are there important reasons not to use tab as field delimiter? 2) Are there important pieces of logging that expect a space instead of a tab and that need to be fixed and that I did not mention in this email? 3) Is February 1st a good date to deploy this change? (Assuming that all preps are finished) Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l