Ok So basically for predictive off-line (as opposed to streaming) in a
nutshell one can use Apache Flume to store twitter data in hdfs and use
Solr to query the data?

This is what it says:

Solr is a standalone enterprise search server with a REST-like API. You put
documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP.
You query it via HTTP GET and receive JSON, XML, CSV or binary results.

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 12:39, Jörn Franke <jornfra...@gmail.com> wrote:

> Well I have seen that The algorithms mentioned are used for this. However
> some preprocessing through solr makes sense - it takes care of synonyms,
> homonyms, stemming etc
>
> On 07 Jun 2016, at 13:33, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Thanks Jorn,
>
> To start I would like to explore how can one turn some of the data into
> useful information.
>
> I would like to look at certain trend analysis. Simple correlation shows
> that the more there is a mention of a typical topic say for example
> "organic food" the more people are inclined to go for it. To see one can
> deduce that orgaind food is a potential growth area.
>
> Now I have all infra-structure to ingest that data. Like using flume to
> store it or Spark streaming to do near real time work.
>
> Now I want to slice and dice that data for say organic food.
>
> I presume this is a typical question.
>
> You mentioned Spark ml (machine learning?) . Is that something viable?
>
> Cheers
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 7 June 2016 at 12:22, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Spark ml Support Vector machines or neural networks could be candidates.
>> For unstructured learning it could be clustering.
>> For doing a graph analysis On the followers you can easily use Spark
>> Graphx
>> Keep in mind that each tweet contains a lot of meta data (location,
>> followers etc) that is more or less structured.
>> For unstructured text analytics (eg tweet itself)I recommend
>> solr/ElasticSearch .
>>
>> However I am not sure what you want to do with the data exactly.
>>
>>
>> On 07 Jun 2016, at 13:16, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Hi,
>>
>> This is really a general question.
>>
>> I use Spark to get twitter data. I did some looking at it
>>
>>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>>     val tweets = TwitterUtils.createStream(ssc, None)
>>     val statuses = tweets.map(status => status.getText())
>>     statuses.print()
>>
>> Ok
>>
>> Also I can use Apache flume to store data in hdfs directory
>>
>> $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf
>> Dflume.root.logger=DEBUG,console -n TwitterAgent
>> Now that stores twitter data in binary format in  hdfs directory.
>>
>> My question is pretty basic.
>>
>> What is the best tool/language to dif in to that data. For example
>> twitter streaming data. I am getting all sorts od stuff coming in. Say I am
>> only interested in certain topics like sport etc. How can I detect the
>> signal from the noise using what tool and language?
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>>
>

Reply via email to