(apologies if this was sent out multiple times before) We are about to start a large-scale text-processing research project and are debating between two alternatives for our cluster -- Spark and Hadoop. I've researched possibilities of using NLTK with Hadoop and see that there's some precedent ( http://blog.cloudera.com/blog/2010/03/natural-language-processing-with-hadoop-and-python/). I wanted to know how easy it might be to use NLTK with pyspark, or if scalanlp is mature enough to be used with the Scala API for Spark/mllib.
Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NLP-with-Spark-tp2612.html Sent from the Apache Spark User List mailing list archive at Nabble.com.