Re: Count of distinct values in each column

2015-07-29 Thread Alexander Krasheninnikov
of distinct values in each column. What is the optimized way to find the same using spark scala? Example CSV format : a,b,c,d a,c,b,a b,b,c,d b,b,c,a c,b,b,a Output expecting : (a,2),(b,2),(c,1) #- First column distinct count (b,4),(c,1) #- Second column distinct count (c,3),(b,2

Count of distinct values in each column

2015-07-29 Thread Devi P.V
Hi All, I have a 5GB CSV dataset having 69 columns..I need to find the count of distinct values in each column. What is the optimized way to find the same using spark scala? Example CSV format : a,b,c,d a,c,b,a b,b,c,d b,b,c,a c,b,b,a Output expecting : (a,2),(b,2),(c,1) #- First column