Re: Spark Image resizing

2019-07-31 Thread Patrick McCarthy
It won't be very efficient but you could write a python UDF using
PythonMagick - https://wiki.python.org/moin/ImageMagick

If you have PyArrow > 0.10 then you might be able to get a boost by saving
images in a column as BinaryType and writing a PandasUDF.

On Wed, Jul 31, 2019 at 6:22 AM Nick Dawes  wrote:

> Any other way of resizing the image before creating the DataFrame in
> Spark? I know opencv does it. But I don't have opencv on my cluster. I have
> Anaconda python packages installed on my cluster.
>
> Any ideas will be appreciated.  Thank you!
>
> On Tue, Jul 30, 2019, 4:17 PM Nick Dawes  wrote:
>
>> Hi
>>
>> I'm new to spark image data source.
>>
>> After creating a dataframe using Spark's image data source, I would like
>> to resize the images in PySpark.
>>
>> df = spark.read.format("image").load(imageDir)
>>
>> Can you please help me with this?
>>
>> Nick
>>
>

-- 


*Patrick McCarthy  *

Senior Data Scientist, Machine Learning Engineering

Dstillery

470 Park Ave South, 17th Floor, NYC 10016


Re: Spark Image resizing

2019-07-31 Thread Nick Dawes
Any other way of resizing the image before creating the DataFrame in Spark?
I know opencv does it. But I don't have opencv on my cluster. I have
Anaconda python packages installed on my cluster.

Any ideas will be appreciated.  Thank you!

On Tue, Jul 30, 2019, 4:17 PM Nick Dawes  wrote:

> Hi
>
> I'm new to spark image data source.
>
> After creating a dataframe using Spark's image data source, I would like
> to resize the images in PySpark.
>
> df = spark.read.format("image").load(imageDir)
>
> Can you please help me with this?
>
> Nick
>


Spark Image resizing

2019-07-30 Thread Nick Dawes
Hi

I'm new to spark image data source.

After creating a dataframe using Spark's image data source, I would like to
resize the images in PySpark.

df = spark.read.format("image").load(imageDir)

Can you please help me with this?

Nick