Hello,
me and my team have developed a fairly large big data application using
only the dataframe api (Spark 1.6.3). Since our application uses machine
learning to do prediction we need to sample the train dataset in order not
to have skewed data.
To achieve such objective we use stratified sampl
Hello everybody,
I'm running a two node spark cluster on ec2, created using the provided
scripts. I then ssh into the master and invoke
"PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS='notebook
--profile=pyspark' spark/bin/pyspark". This launches a spark notebook which
has been instructe
Hello everybody,
in case you missed DataBricks and Berkeley have announced a free mooc on
spark and another one on scalable machine learning using spark. Both
courses are free but if you want to have a verified certificate of
completion you need to donate at least 50$. I did it, it's a great
invest