Spark ml Support Vector machines or neural networks could be candidates. For unstructured learning it could be clustering. For doing a graph analysis On the followers you can easily use Spark Graphx Keep in mind that each tweet contains a lot of meta data (location, followers etc) that is more or less structured. For unstructured text analytics (eg tweet itself)I recommend solr/ElasticSearch .
However I am not sure what you want to do with the data exactly. > On 07 Jun 2016, at 13:16, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Hi, > > This is really a general question. > > I use Spark to get twitter data. I did some looking at it > > val ssc = new StreamingContext(sparkConf, Seconds(2)) > val tweets = TwitterUtils.createStream(ssc, None) > val statuses = tweets.map(status => status.getText()) > statuses.print() > > Ok > > Also I can use Apache flume to store data in hdfs directory > > $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf > Dflume.root.logger=DEBUG,console -n TwitterAgent > Now that stores twitter data in binary format in hdfs directory. > > My question is pretty basic. > > What is the best tool/language to dif in to that data. For example twitter > streaming data. I am getting all sorts od stuff coming in. Say I am only > interested in certain topics like sport etc. How can I detect the signal from > the noise using what tool and language? > > Thanks > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com >