So it’s certainly doable (it’s not super easy mind you), but until the
arrow udf release goes out it will be rather slow.

On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote:

> Hi,
>
> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
> was wondering if this is a good idea and what are the right Spark operators
> to do this? The reason we want to try this combination is that we don't
> want to run our transformations in python (pyspark), but after the
> transformations, we need to run some natural language processing operations
> and we don't want to restrict the functions data scientists' can use to
> Spark natural language library. So, Spark streaming with NLTK looks like
> the right option, from the perspective of fast data processing and data
> science flexibility.
>
> Regards,
> Ashish
>
-- 
Twitter: https://twitter.com/holdenkarau

Reply via email to