Thanks Fokko, will take a look.
On Thu, Feb 7, 2019 at 12:08 AM Driesprong, Fokko
wrote:
> Hi Tao,
>
> For the Dataproc, which is the managed hadoop of GCP, I've implemented a
> method a while ago. It will check if the Python file is local, and if this
> is the case, it will be uploaded to the t
Hi Tao,
For the Dataproc, which is the managed hadoop of GCP, I've implemented a
method a while ago. It will check if the Python file is local, and if this
is the case, it will be uploaded to the temporary bucket which is provided
with the cluster:
https://github.com/apache/airflow/blob/master/air
Hi,
I wonder any suggestions on how to use SparkOperator to send pyspark file
to the spark cluster. And any suggestions on how to specify the pyspark
dependency ?
We currently push user pyspark file and dependency to a S3 location and get
picked up by our Spark cluster. And we would like to explo