/start-master.sh script) and at least one worker node (can
> be started using SPARK_HOME/sbin/start-slave.sh script).SparkConf should
> use master node address to create (spark://host:port)
>
> Thanks!
>
> Gangadhar
> From: Li Jin <ice.xell...@gmail.com<mailto:ic
Hi,
I am wondering does pyspark standalone (local) mode support multi
cores/executors?
Thanks,
Li
I am not an expert on this but here is what I think:
Catalyst maintains information on whether a plan node is ordered. If your
dataframe is a result of a order by, catalyst will skip the sorting when it
does merge sort join. If you dataframe is created from storage, for
instance. ParquetRelation,
Yeoul,
I think a you can run an microbench for pyspark
serialization/deserialization would be to run a withColumn + a python udf
that returns a constant and compare that with similar code in
Scala.
I am not sure if there is way to measure just the serialization code,
because pyspark API only
Hi All,
This is Li Jin. We (me and my fellow colleagues at Two Sigma) have been
using Spark for time series analysis for the past two years and it has been
a success to scale up our time series analysis.
Recently, we start a conversation with Reynold about potential
opportunities to collaborate