10:04
A: Stefano Lodi
Cc: user@spark.apache.org
Oggetto: Re: countApprox
countApprox gives the best answer within some timeout. Is it possible
that 1ms is more than enough to count this exactly? then the
confidence wouldn't matter. Although that seems way too fast, you're
counting ranges whose values
countApprox gives the best answer within some timeout. Is it possible
that 1ms is more than enough to count this exactly? then the
confidence wouldn't matter. Although that seems way too fast, you're
counting ranges whose values don't actually matter, and maybe the
Python side is smart enough
I am experimenting with countApprox. I created a RDD of 10^8 numbers and ran
countApprox with different parameters but I failed to generate any approximate
output. In all runs it returns the exact number of elements. What is the effect
of approximation in countApprox supposed