So it’s certainly doable (it’s not super easy mind you), but until the arrow udf release goes out it will be rather slow.
On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote: > Hi, > > Has someone tried running NLTK (python) with Spark Streaming (scala)? I > was wondering if this is a good idea and what are the right Spark operators > to do this? The reason we want to try this combination is that we don't > want to run our transformations in python (pyspark), but after the > transformations, we need to run some natural language processing operations > and we don't want to restrict the functions data scientists' can use to > Spark natural language library. So, Spark streaming with NLTK looks like > the right option, from the perspective of fast data processing and data > science flexibility. > > Regards, > Ashish > -- Twitter: https://twitter.com/holdenkarau