Yes. scala> rawLogs.partitions.size res1: Int = 2171
On Tue, Jun 24, 2014 at 4:00 PM, Mayur Rustagi <mayur.rust...@gmail.com> wrote: > To be clear number of map tasks are determined by number of partitions > inside the rdd hence the suggestion by Nicholas. > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Wed, Jun 25, 2014 at 4:17 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> So do you get 2171 as the output for that command? That command tells >> you how many partitions your RDD has, so it’s good to first confirm that >> rdd1 has as many partitions as you think it has. >> >> >> >> On Tue, Jun 24, 2014 at 4:22 PM, Alex Boisvert <alex.boisv...@gmail.com> >> wrote: >> >>> It's actually a set of 2171 S3 files, with an average size of about 18MB. >>> >>> >>> On Tue, Jun 24, 2014 at 1:13 PM, Nicholas Chammas < >>> nicholas.cham...@gmail.com> wrote: >>> >>>> What do you get for rdd1._jrdd.splits().size()? You might think you’re >>>> getting > 100 partitions, but it may not be happening. >>>> >>>> >>>> >>>> On Tue, Jun 24, 2014 at 3:50 PM, Alex Boisvert <alex.boisv...@gmail.com >>>> > wrote: >>>> >>>>> With the following pseudo-code, >>>>> >>>>> val rdd1 = sc.sequenceFile(...) // has > 100 partitions >>>>> val rdd2 = rdd1.coalesce(100) >>>>> val rdd3 = rdd2 map { ... } >>>>> val rdd4 = rdd3.coalesce(2) >>>>> val rdd5 = rdd4.saveAsTextFile(...) // want only two output files >>>>> >>>>> I would expect the parallelism of the map() operation to be 100 >>>>> concurrent tasks, and the parallelism of the save() operation to be 2. >>>>> >>>>> However, it appears the parallelism of the entire chain is 2 -- I only >>>>> see two tasks created for the save() operation and those tasks appear to >>>>> execute the map() operation as well. >>>>> >>>>> Assuming what I'm seeing is as-specified (meaning, how things are >>>>> meant to be), what's the recommended way to force a parallelism of 100 on >>>>> the map() operation? >>>>> >>>>> thanks! >>>>> >>>>> >>>>> >>>> >>> >> >