Re:Re: Re: spark sql data skew

2018-07-22 Thread 崔苗
but how to get count(distinct userId) group by company from count(distinct userId) group by company+x? count(userId) is different from count(distinct userId) 在 2018-07-21 00:49:58,Xiaomeng Wan 写道: try divide and conquer, create a column x for the fist character of userid, and group by company+x

Re: How to register custom structured streaming source

2018-07-22 Thread Hien Luu
Hi Farshid, Take a look at this example on github - https://github.com/hienluu/structured-streaming-sources. Cheers, Hien On Thu, Jul 12, 2018 at 12:52 AM Farshid Zavareh wrote: > Hello. > > I need to create a custom streaming source by extending *FileStreamSource*. > The idea is to override