Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-03-20 Thread Diederik van Liere
This bug has been fixed, see
https://bugzilla.wikimedia.org/show_bug.cgi?id=45178

I will post a message on the Village Pump as well.

Best,
Diederik


On Sun, Feb 3, 2013 at 3:44 PM, Brad Jorsch bjor...@wikimedia.org wrote:

 On Fri, Jan 25, 2013 at 12:51 PM, Diederik van Liere
 dvanli...@wikimedia.org wrote:
  No, the output format of
 http://dumps.wikimedia.org/other/pagecounts-raw/
  will stay the same.

 It seems that page names are coming through with spaces now, where
 they didn't before. See

 https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Format_Change_of_Page_View_Stats

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-02-03 Thread Brad Jorsch
On Fri, Jan 25, 2013 at 12:51 PM, Diederik van Liere
dvanli...@wikimedia.org wrote:
 No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/
 will stay the same.

It seems that page names are coming through with spaces now, where
they didn't before. See
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Format_Change_of_Page_View_Stats

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-01-25 Thread Diederik van Liere
Apologies for crossposting

Heya,

The Analytics Team is planning to deploy tab as field delimiter to
replace the current space as fielddelimiter on the varnish/squid/nginx
servers. We would like to do this on February 1st. The reason for this
change is that we need to have a consistent number of fields in each
webrequest log line. Right now, some fields contain spaces and that require
a lot of post-processing cleanup and slows down the generation of reports.

What is affected and maintained by Analytics

* udp-filter already has support for the tab character
* webstatscollector: we compiled a new version of filter to add support for
the tab character
* wikistats: we will fix the scripts on an ongoing basis.
* udp2log: we have a patch ready for inserting sequence numbers separated
by tab.

In particular, I would like to have feedback to three questions:

1) Are there important reasons not to use tab as field delimiter?

2) Are there important pieces of logging that expect a space instead of a
tab and that need to be fixed and that I did not mention in this email?

3) Is February 1st a good date to deploy this change? (Assuming that all
preps are finished)


Best,

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-01-25 Thread bawolff
Just to clarify, will this affect the stats at
http://dumps.wikimedia.org/other/pagecounts-raw/ ? Changing the format
of that will probably break third party scripts.
--
-bawolff


On Fri, Jan 25, 2013 at 1:41 PM, Diederik van Liere
dvanli...@wikimedia.org wrote:
 Apologies for crossposting

 Heya,

 The Analytics Team is planning to deploy tab as field delimiter to
 replace the current space as fielddelimiter on the varnish/squid/nginx
 servers. We would like to do this on February 1st. The reason for this
 change is that we need to have a consistent number of fields in each
 webrequest log line. Right now, some fields contain spaces and that require
 a lot of post-processing cleanup and slows down the generation of reports.

 What is affected and maintained by Analytics

 * udp-filter already has support for the tab character
 * webstatscollector: we compiled a new version of filter to add support for
 the tab character
 * wikistats: we will fix the scripts on an ongoing basis.
 * udp2log: we have a patch ready for inserting sequence numbers separated
 by tab.

 In particular, I would like to have feedback to three questions:

 1) Are there important reasons not to use tab as field delimiter?

 2) Are there important pieces of logging that expect a space instead of a
 tab and that need to be fixed and that I did not mention in this email?

 3) Is February 1st a good date to deploy this change? (Assuming that all
 preps are finished)


 Best,

 Diederik
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-01-25 Thread Diederik van Liere
No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/
will stay the same.
Best,
Diederik


On Fri, Jan 25, 2013 at 12:48 PM, bawolff bawolff...@gmail.com wrote:

 Just to clarify, will this affect the stats at
 http://dumps.wikimedia.org/other/pagecounts-raw/ ? Changing the format
 of that will probably break third party scripts.
 --
 -bawolff


 On Fri, Jan 25, 2013 at 1:41 PM, Diederik van Liere
 dvanli...@wikimedia.org wrote:
  Apologies for crossposting
 
  Heya,
 
  The Analytics Team is planning to deploy tab as field delimiter to
  replace the current space as fielddelimiter on the varnish/squid/nginx
  servers. We would like to do this on February 1st. The reason for this
  change is that we need to have a consistent number of fields in each
  webrequest log line. Right now, some fields contain spaces and that
 require
  a lot of post-processing cleanup and slows down the generation of
 reports.
 
  What is affected and maintained by Analytics
 
  * udp-filter already has support for the tab character
  * webstatscollector: we compiled a new version of filter to add support
 for
  the tab character
  * wikistats: we will fix the scripts on an ongoing basis.
  * udp2log: we have a patch ready for inserting sequence numbers separated
  by tab.
 
  In particular, I would like to have feedback to three questions:
 
  1) Are there important reasons not to use tab as field delimiter?
 
  2) Are there important pieces of logging that expect a space instead of a
  tab and that need to be fixed and that I did not mention in this email?
 
  3) Is February 1st a good date to deploy this change? (Assuming that all
  preps are finished)
 
 
  Best,
 
  Diederik
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l