In my company, we calculate UV/PV offline in batch, and update every day. If do it online, url + timestamp could be the rowkey.
2016-05-16 18:13 GMT+08:00 齐忠 <[email protected]>: > Yes, like google analytics. > > 2016-05-16 17:48 GMT+08:00 Heng Chen <[email protected]>: > > You want to calculate UV/PV online? > > > > 2016-05-16 16:46 GMT+08:00 齐忠 <[email protected]>: > > > >> I have very large log(50T per day), > >> > >> My log event as follows > >> > >> url,visitid,requesttime > >> > >> http://www.aaa.com?a=b&c=d&e=f, 1, 1463387380 > >> http://www.aaa.com?a=b&c=d&e=fa, 1, 1463387280 > >> http://www.aaa.com?a=b&c=d&e=fa, 2, 1463387280 > >> http://www.aaa.com?a=b&c=d&e=fab, 2, 1463387280 > >> http://www.aaa.com?a=b&c=d&e=f, 1, 1463387380 > >> > >> > >> When a user enters a part of the url, and returns the > >> uv(UniqueVisitor) pv(PageView)。 > >> > >> for example > >> > >> input: e=f* > >> > >> output: uv=2,pv=5, > >> > >> input: e=fa > >> > >> output:uv=2,pv=3 > >> > >> How to design rowkey? > >> > >> Thanks. > >> > > > > -- > [email protected]|齐忠 >
