Re: Plot DataFrame with matplotlib

2016-03-30 Thread Yavuz Nuzumlalı
Hi Teng,

Thanks for the answer. I've switched to pandas during proof of concept
process in order to be able to plot graphs easily.

Actually, pandas DataFrame object itself has `plot` methods, so these
objects can plot themselves on most cases easily (it uses matplotlib
inside).

I wonder if spark DataFrame API would consider moving in that direction,
because plotting is really important during analysis process, and
converting data frame using `toPandas()` method would fail for data that do
not fit in memory.

Although I'm not much familiar with internals, I would like to help for
anything if team considers adding such a feature.

On Wed, Mar 23, 2016 at 2:16 PM Teng Qiu  wrote:

> e... then this sounds like a feature requirement for matplotlib, you
> need to make matplotlib's APIs support RDD or spark DataFrame object,
> i checked the API of mplot3d
> (
> http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter
> ),
> it only supports "array-like" input data.
>
> so yes, to use matplotlib, you need to take the elements out of RDD,
> and send them to plot API as list object.
>
> 2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı :
> > Thanks for help, but the example that you referenced gets the values from
> > RDD as list and plots that list.
> >
> > What I am specifically asking was that is there a convenient way to plot
> a
> > DataFrame object directly?(like pandas DataFrame objects)
> >
> >
> > On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu  wrote:
> >>
> >> not sure about 3d plot, but there is a nice example:
> >>
> >>
> https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb
> >>
> >> for plotting rdd or dataframe using matplotlib.
> >>
> >> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı :
> >> > Hi all,
> >> > I'm trying to plot the result of a simple PCA operation, but couldn't
> >> > find a clear documentation about plotting data frames.
> >> > Here is the output of my data frame:
> >> > ++
> >> > |pca_features|
> >> > ++
> >> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] |
> >> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] |
> >> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]|
> >> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245]   |
> >> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557]   |
> >> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]|
> >> > Values of `pca_features` column is DenseVector s created using
> >> > VectorAssembler.
> >> > How can I draw a simple 3d scatter plot from this data frame?
> >> > Thanks
>


Re: Plot DataFrame with matplotlib

2016-03-23 Thread Teng Qiu
e... then this sounds like a feature requirement for matplotlib, you
need to make matplotlib's APIs support RDD or spark DataFrame object,
i checked the API of mplot3d
(http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter),
it only supports "array-like" input data.

so yes, to use matplotlib, you need to take the elements out of RDD,
and send them to plot API as list object.

2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı :
> Thanks for help, but the example that you referenced gets the values from
> RDD as list and plots that list.
>
> What I am specifically asking was that is there a convenient way to plot a
> DataFrame object directly?(like pandas DataFrame objects)
>
>
> On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu  wrote:
>>
>> not sure about 3d plot, but there is a nice example:
>>
>> https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb
>>
>> for plotting rdd or dataframe using matplotlib.
>>
>> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı :
>> > Hi all,
>> > I'm trying to plot the result of a simple PCA operation, but couldn't
>> > find a clear documentation about plotting data frames.
>> > Here is the output of my data frame:
>> > ++
>> > |pca_features|
>> > ++
>> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] |
>> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] |
>> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]|
>> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245]   |
>> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557]   |
>> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]|
>> > Values of `pca_features` column is DenseVector s created using
>> > VectorAssembler.
>> > How can I draw a simple 3d scatter plot from this data frame?
>> > Thanks

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Plot DataFrame with matplotlib

2016-03-23 Thread Yavuz Nuzumlalı
Thanks for help, but the example that you referenced gets the values from
RDD as list and plots that list.

What I am specifically asking was that is there a convenient way to plot a
DataFrame object directly?(like pandas DataFrame objects)


On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu  wrote:

> not sure about 3d plot, but there is a nice example:
>
> https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb
>
> for plotting rdd or dataframe using matplotlib.
>
> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı :
> > Hi all,
> > I'm trying to plot the result of a simple PCA operation, but couldn't
> find a clear documentation about plotting data frames.
> > Here is the output of my data frame:
> > ++
> > |pca_features|
> > ++
> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] |
> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] |
> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]|
> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245]   |
> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557]   |
> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]|
> > Values of `pca_features` column is DenseVector s created using
> VectorAssembler.
> > How can I draw a simple 3d scatter plot from this data frame?
> > Thanks


Re: Plot DataFrame with matplotlib

2016-03-23 Thread Teng Qiu
not sure about 3d plot, but there is a nice example:
https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb

for plotting rdd or dataframe using matplotlib.

Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı :
> Hi all,
> I'm trying to plot the result of a simple PCA operation, but couldn't
find a clear documentation about plotting data frames.
> Here is the output of my data frame:
> ++
> |pca_features|
> ++
> |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] |
> |[-477.03566189308367,-6.170290817861212,-5.280827588464785] |
> |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]|
> |[-53.721252166903255,0.6162589419996329,-0.39569546286098245]   |
> [-27.97717473880869,0.30883567826481106,-0.11159555340377557]   |
> |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]|
> Values of `pca_features` column is DenseVector s created using
VectorAssembler.
> How can I draw a simple 3d scatter plot from this data frame?
> Thanks