Re: Using countApproxDistinct in pyspark

2014-08-04 Thread Diederik
Dear Davies, Thanks so much for your instructions! It worked like a charm. Best, Diederik On Wed, Jul 30, 2014 at 1:27 AM, Davies Liu-2 [via Apache Spark User List] < ml-node+s1001560n10917...@n3.nabble.com> wrote: > Hey Diederik, > > The data in rdd._jrdd.rdd() is serializ

Using countApproxDistinct in pyspark

2014-07-29 Thread Diederik
hat is also weird is that when I set p to 8, I should get a more accurate number, but it's actually smaller. Any tips or pointers are much appreciated! Best, Diederik -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-countApproxDistinct-in-pyspark-tp1087