Sorry, but the accumulator is still going to require you to walk through the 
RDD to get an accurate count, right? 
Its not being persisted? 

On Jan 14, 2015, at 5:17 AM, Ganelin, Ilya <[email protected]> wrote:

> Alternative to doing a naive toArray is to declare an accumulator per 
> partition and use that. It's specifically what they were designed to do. See 
> the programming guide.
> 
> 
> 
> Sent with Good (www.good.com)
> 
> 
> -----Original Message-----
> From: Tobias Pfeiffer [[email protected]]
> Sent: Tuesday, January 13, 2015 08:06 PM Eastern Standard Time
> To: Kevin Burton
> Cc: Ganelin, Ilya; [email protected]
> Subject: Re: quickly counting the number of rows in a partition?
> 
> Hi,
> 
> On Mon, Jan 12, 2015 at 8:09 PM, Ganelin, Ilya <[email protected]> 
> wrote:
> Use the mapPartitions function. It returns an iterator to each partition. 
> Then just get that length by converting to an array.
>  
> On Tue, Jan 13, 2015 at 2:50 PM, Kevin Burton <[email protected]> wrote:
> Doesn’t that just read in all the values?  The count isn’t pre-computed? It’s 
> not the end of the world if it’s not but would be faster.
> 
> Well, "converting to an array" may not work due to memory constraints, 
> counting the items in the iterator may be better. However, there is no 
> "pre-computed" value. For counting, you need to compute all values in the 
> RDD, in general. If you think of
> 
>     items.map(x => /* throw exception */).count()
> 
> then even though the count you want to get does not necessarily require the 
> evaluation of the function in map() (i.e., the number is the same), you may 
> not want to get the count if that code actually fails.
> 
> Tobias
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates. The information transmitted herewith is 
> intended only for use by the individual or entity to which it is addressed.  
> If the reader of this message is not the intended recipient, you are hereby 
> notified that any review, retransmission, dissemination, distribution, 
> copying or other use of, or taking of any action in reliance upon this 
> information is strictly prohibited. If you have received this communication 
> in error, please contact the sender and delete the material from your 
> computer.

Reply via email to