Re: Analyzing twitter data

2016-06-08 Thread Jörn Franke
You can directly load it into solr. But think about what you want to index etc. > On 08 Jun 2016, at 15:51, Mich Talebzadeh wrote: > > yes. use that is reasonable. > > What is the format of twitter data. Is that primarily json.? > > If I do > > duser@rhes564:

Re: Analyzing twitter data

2016-06-08 Thread Mich Talebzadeh
yes. use that is reasonable. What is the format of twitter data. Is that primarily json.? If I do *duser@rhes564: /usr/lib/nifi-0.6.1/conf> hdfs dfs -cat /twitter_data/FlumeData.1464945101915|more* 16/06/08 14:48:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your

Re: Analyzing twitter data

2016-06-08 Thread Jörn Franke
That is trivial to do , I did it once when they were in json format > On 08 Jun 2016, at 13:15, Mich Talebzadeh wrote: > > Interesting. There is also apache nifi > > Also I note that one can store twitter data in Hive tables as well? > > > > Dr Mich Talebzadeh >

Re: Analyzing twitter data

2016-06-08 Thread Mich Talebzadeh
Interesting. There is also apache nifi Also I note that one can store twitter data in Hive tables as well? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Analyzing twitter data

2016-06-07 Thread Mich Talebzadeh
thanks I will have a look. Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 7 June 2016 at

Re: Analyzing twitter data

2016-06-07 Thread Jörn Franke
Solr is basically an in-memory text index with a lot of capabilities for language analysis extraction (you can compare it to a Google for your tweets). The system itself has a lot of features and has a complexity similar to Big data systems. This index files can be backed by HDFS. You can put

Re: Analyzing twitter data

2016-06-07 Thread Mich Talebzadeh
Ok So basically for predictive off-line (as opposed to streaming) in a nutshell one can use Apache Flume to store twitter data in hdfs and use Solr to query the data? This is what it says: Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called

Re: Analyzing twitter data

2016-06-07 Thread Jörn Franke
Well I have seen that The algorithms mentioned are used for this. However some preprocessing through solr makes sense - it takes care of synonyms, homonyms, stemming etc > On 07 Jun 2016, at 13:33, Mich Talebzadeh wrote: > > Thanks Jorn, > > To start I would like

Re: Analyzing twitter data

2016-06-07 Thread Mich Talebzadeh
Thanks Jorn, To start I would like to explore how can one turn some of the data into useful information. I would like to look at certain trend analysis. Simple correlation shows that the more there is a mention of a typical topic say for example "organic food" the more people are inclined to go

Re: Analyzing twitter data

2016-06-07 Thread Jörn Franke
Spark ml Support Vector machines or neural networks could be candidates. For unstructured learning it could be clustering. For doing a graph analysis On the followers you can easily use Spark Graphx Keep in mind that each tweet contains a lot of meta data (location, followers etc) that is more

Analyzing twitter data

2016-06-07 Thread Mich Talebzadeh
Hi, This is really a general question. I use Spark to get twitter data. I did some looking at it val ssc = new StreamingContext(sparkConf, Seconds(2)) val tweets = TwitterUtils.createStream(ssc, None) val statuses = tweets.map(status => status.getText()) statuses.print() Ok