Replace this line:

 img_data = sc.parallelize( list(im.getdata()) )


 img_data = sc.parallelize( list(im.getdata()), 3 * No cores you have )

Best Regards

On Thu, Jun 4, 2015 at 1:57 AM, Justin Spargur <> wrote:

> Hi all,
>      I'm playing around with manipulating images via Python and want to
> utilize Spark for scalability. That said, I'm just learing Spark and my
> Python is a bit rusty (been doing PHP coding for the last few years). I
> think I have most of the process figured out. However, the script fails on
> larger images and Spark is sending out the following warning for smaller
> images:
> Stage 0 contains a task of very large size (1151 KB). The maximum
> recommended task size is 100 KB.
> My code is as follows:
> import Image
> from pyspark import SparkContext
> if __name__ == "__main__":
>     imageFile = "sample.jpg"
>     outFile   = "sample.gray.jpg"
>     sc = SparkContext(appName="Grayscale")
>     im =
>     # Create an RDD for the data from the image file
>     img_data = sc.parallelize( list(im.getdata()) )
>     # Create an RDD for the grayscale value
>     gValue = lambda x: int(x[0]*0.21 + x[1]*0.72 +
> x[2]*0.07) )
>     # Put our grayscale value into the RGR channels
>     grayscale = lambda x: (x,x,x)  )
>     # Save the output in a new image.
>     im.putdata( grayscale.collect() )
> Obviously, something is amiss. However, I can't figure out where I'm off
> track with this. Any help is appreciated! Thanks in advance!!!

Reply via email to