Re: Reading PDF/text/word file efficiently with Spark

2017-05-23 Thread Sonal Goyal
se both flume and kafka together ( flafka > <http://blog.cloudera.com/blog/2014/11/flafka-apache- > flume-meets-apache-kafka-for-event-processing/> > ). > > > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.co

Re: Reading PDF/text/word file efficiently with Spark

2017-05-23 Thread docdwarf
/flafka-apache-flume-meets-apache-kafka-for-event-processing/> ). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-PDF-text-word-file-efficiently-with-Spark-tp28699p28705.html Sent from the Apache Spark Us

Reading PDF/text/word file efficiently with Spark

2017-05-19 Thread tesm...@gmail.com
Hi, I am doing NLP (Natural Language Processing) processing on my data. The data is in form of files that can be of type PDF/Text/Word/HTML. These files are stored in a directory structure on my local disk, even nested directories. My stand alone Java based NLP parser can read input files, extract

Reading PDF/text/word file efficiently with Spark

2017-05-19 Thread tesmai4
-PDF-text-word-file-efficiently-with-Spark-tp28699.html Sent from the Apache Spark User List mailing list archive at Nabble.com.