Hi Kohki,
Serialization of tasks happens in local mode too and as far as I am
aware there is no way to disable this (although it would definitely be
useful in my opinion).
You can see the local mode as a testing mode, in which you would want to
catch any serialization errors, before they appear i
Hi,
The RDD API provides async variants of a few RDD methods, which let the
user execute the corresponding jobs asynchronously. This makes it
possible to cancel the jobs for instance:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/AsyncRDDActions.html
There does not seem to be
our help so far!
Antonin
On 04/07/2020 19:19, Juan Martín Guillén wrote:
> Would you be able to send the code you are running?
> That would be great if you include some sample data.
> Is that possible?
>
>
> El sábado, 4 de julio de 2020 13:09:23 ART, Antonin Delpeuch (lists)
&g
; https://spark.apache.org/docs/latest/submitting-applications.html#master-urls
>
> Regards,
> Juan Martín.
>
>
>
>
> El sábado, 4 de julio de 2020 12:17:01 ART, Antonin Delpeuch (lists)
> escribió:
>
>
> Hi,
>
> I am working on revamping the archit
Hi,
I am working on revamping the architecture of OpenRefine, an ETL tool,
to execute workflows on datasets which do not fit in RAM.
Spark's RDD API is a great fit for the tool's operations, and provides
everything we need: partitioning and lazy evaluation.
However, OpenRefine is a lightweight t