Re: [ANNOUNCE] Announcing Spark 1.3!

2015-03-13 Thread Sebastián Ramírez
Awesome! Thanks! *Sebastián Ramírez* Head of Software Development <http://www.senseta.com> Tel: (+571) 795 7950 ext: 1012 Cel: (+57) 300 370 77 10 Calle 73 No 7 - 06 Piso 4 Linkedin: co.linkedin.com/in/tiangolo/ Twitter: @tiangolo <https://twitter.com/tiangol

PySpark: Python 2.7 cluster installation script (with Numpy, IPython, etc)

2015-03-11 Thread Sebastián Ramírez
o I made a *simple script which helps installing Anaconda Python in the machines of a cluster *more easily. I wanted to share it here, in case it can help someone wanting using PySpark. https://github.com/tiangolo/anaconda_cluster_install *Sebastián Ramírez* Head of Software Developme

Re: Is Ubuntu server or desktop better for spark cluster

2015-02-24 Thread Sebastián Ramírez
ee some memory and work from a terminal: # Open a terminal Ctrl+Alt+F1 # Shutdown the GUI sudo stop lightdm (for reference: http://askubuntu.com/questions/148321/how-do-i-stop-gui) *Sebastián Ramírez* Diseñador de Algoritmos <http://www.senseta.com> Tel: (+571) 795 7950

Re: Pyspark save Decison Tree Module with joblib/pickle

2015-02-24 Thread Sebastián Ramírez
Great to know, thanks Xiangrui. *Sebastián Ramírez* Diseñador de Algoritmos <http://www.senseta.com> Tel: (+571) 795 7950 ext: 1012 Cel: (+57) 300 370 77 10 Calle 73 No 7 - 06 Piso 4 Linkedin: co.linkedin.com/in/tiangolo/ Twitter: @tiangolo <https://twitter.com

Re: Pyspark save Decison Tree Module with joblib/pickle

2015-02-23 Thread Sebastián Ramírez
g in pseudo-code that you can save to a file. Then, you can parse that pseudo code to write a proper script that runs the Decision Tree. Actually, that's what I did for a Random Forest (an ensamble of Decision Trees). Hope that helps, *Sebastián Ramírez* Diseñador de Algoritmos

Re: Anaconda iPython notebook working with CDH Spark

2014-12-30 Thread Sebastián Ramírez
x27;t remember how they say it), choose "yes" I hope that helps, *Sebastián Ramírez* Diseñador de Algoritmos <http://www.senseta.com> Tel: (+571) 795 7950 ext: 1012 Cel: (+57) 300 370 77 10 Calle 73 No 7 - 06 Piso 4 Linkedin: co.linkedin.com/in/tiangolo/ Twitte

Re: pyspark sc.textFile uses only 4 out of 32 threads per node

2014-12-16 Thread Sebastián Ramírez
" are lazy, and aren't applied until they are needed by an "action" (and, to me, it happend for readings too some time ago). You can try calling a .first() in your RDD from once in a while to force it to load the RDD to your cluster (but it might not be the cleanest way to do

Re: Pyspark 1.1.1 error with large number of records - serializer.dump_stream(func(split_index, iterator), outfile)

2014-12-16 Thread Sebastián Ramírez
1.0... I hope that helps. Best, *Sebastián Ramírez* Diseñador de Algoritmos <http://www.senseta.com> Tel: (+571) 795 7950 ext: 1012 Cel: (+57) 300 370 77 10 Calle 73 No 7 - 06 Piso 4 Linkedin: co.linkedin.com/in/tiangolo/ Twitter: @tiangolo <https:/