Re: countApprox

2016-09-18 Thread Stefano Lodi
10:04 A: Stefano Lodi Cc: user@spark.apache.org Oggetto: Re: countApprox countApprox gives the best answer within some timeout. Is it possible that 1ms is more than enough to count this exactly? then the confidence wouldn't matter. Although that seems way too fast, you're counting ranges whose values

Re: countApprox

2016-09-16 Thread Sean Owen
countApprox gives the best answer within some timeout. Is it possible that 1ms is more than enough to count this exactly? then the confidence wouldn't matter. Although that seems way too fast, you're counting ranges whose values don't actually matter, and maybe the Python side is smart enough

countApprox

2016-09-15 Thread Stefano Lodi
I am experimenting with countApprox. I created a RDD of 10^8 numbers and ran countApprox with different parameters but I failed to generate any approximate output. In all runs it returns the exact number of elements. What is the effect of approximation in countApprox supposed