Re: Plot DataFrame with matplotlib
Hi Teng, Thanks for the answer. I've switched to pandas during proof of concept process in order to be able to plot graphs easily. Actually, pandas DataFrame object itself has `plot` methods, so these objects can plot themselves on most cases easily (it uses matplotlib inside). I wonder if spark DataFrame API would consider moving in that direction, because plotting is really important during analysis process, and converting data frame using `toPandas()` method would fail for data that do not fit in memory. Although I'm not much familiar with internals, I would like to help for anything if team considers adding such a feature. On Wed, Mar 23, 2016 at 2:16 PM Teng Qiuwrote: > e... then this sounds like a feature requirement for matplotlib, you > need to make matplotlib's APIs support RDD or spark DataFrame object, > i checked the API of mplot3d > ( > http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter > ), > it only supports "array-like" input data. > > so yes, to use matplotlib, you need to take the elements out of RDD, > and send them to plot API as list object. > > 2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı : > > Thanks for help, but the example that you referenced gets the values from > > RDD as list and plots that list. > > > > What I am specifically asking was that is there a convenient way to plot > a > > DataFrame object directly?(like pandas DataFrame objects) > > > > > > On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu wrote: > >> > >> not sure about 3d plot, but there is a nice example: > >> > >> > https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb > >> > >> for plotting rdd or dataframe using matplotlib. > >> > >> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı : > >> > Hi all, > >> > I'm trying to plot the result of a simple PCA operation, but couldn't > >> > find a clear documentation about plotting data frames. > >> > Here is the output of my data frame: > >> > ++ > >> > |pca_features| > >> > ++ > >> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] | > >> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] | > >> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]| > >> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245] | > >> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557] | > >> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]| > >> > Values of `pca_features` column is DenseVector s created using > >> > VectorAssembler. > >> > How can I draw a simple 3d scatter plot from this data frame? > >> > Thanks >
Re: Plot DataFrame with matplotlib
e... then this sounds like a feature requirement for matplotlib, you need to make matplotlib's APIs support RDD or spark DataFrame object, i checked the API of mplot3d (http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter), it only supports "array-like" input data. so yes, to use matplotlib, you need to take the elements out of RDD, and send them to plot API as list object. 2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı: > Thanks for help, but the example that you referenced gets the values from > RDD as list and plots that list. > > What I am specifically asking was that is there a convenient way to plot a > DataFrame object directly?(like pandas DataFrame objects) > > > On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu wrote: >> >> not sure about 3d plot, but there is a nice example: >> >> https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb >> >> for plotting rdd or dataframe using matplotlib. >> >> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı : >> > Hi all, >> > I'm trying to plot the result of a simple PCA operation, but couldn't >> > find a clear documentation about plotting data frames. >> > Here is the output of my data frame: >> > ++ >> > |pca_features| >> > ++ >> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] | >> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] | >> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]| >> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245] | >> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557] | >> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]| >> > Values of `pca_features` column is DenseVector s created using >> > VectorAssembler. >> > How can I draw a simple 3d scatter plot from this data frame? >> > Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Plot DataFrame with matplotlib
Thanks for help, but the example that you referenced gets the values from RDD as list and plots that list. What I am specifically asking was that is there a convenient way to plot a DataFrame object directly?(like pandas DataFrame objects) On Wed, Mar 23, 2016 at 11:47 AM Teng Qiuwrote: > not sure about 3d plot, but there is a nice example: > > https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb > > for plotting rdd or dataframe using matplotlib. > > Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı : > > Hi all, > > I'm trying to plot the result of a simple PCA operation, but couldn't > find a clear documentation about plotting data frames. > > Here is the output of my data frame: > > ++ > > |pca_features| > > ++ > > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] | > > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] | > > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]| > > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245] | > > [-27.97717473880869,0.30883567826481106,-0.11159555340377557] | > > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]| > > Values of `pca_features` column is DenseVector s created using > VectorAssembler. > > How can I draw a simple 3d scatter plot from this data frame? > > Thanks
Re: Plot DataFrame with matplotlib
not sure about 3d plot, but there is a nice example: https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb for plotting rdd or dataframe using matplotlib. Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı : > Hi all, > I'm trying to plot the result of a simple PCA operation, but couldn't find a clear documentation about plotting data frames. > Here is the output of my data frame: > ++ > |pca_features| > ++ > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] | > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] | > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]| > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245] | > [-27.97717473880869,0.30883567826481106,-0.11159555340377557] | > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]| > Values of `pca_features` column is DenseVector s created using VectorAssembler. > How can I draw a simple 3d scatter plot from this data frame? > Thanks