Interesting. There is also apache nifi <https://nifi.apache.org/>

Also I note that one can store twitter data in Hive tables as well?



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 15:59, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

> thanks I will have a look.
>
> Mich
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 7 June 2016 at 13:38, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Solr is basically an in-memory text index with a lot of capabilities for
>> language analysis extraction (you can compare  it to a Google for your
>> tweets). The system itself has a lot of features and has a complexity
>> similar to Big data systems. This index files can be backed by HDFS. You
>> can put the tweets directly into solr without going via HDFS files.
>>
>> Carefully decide what fields to index / you want to search. It does not
>> make sense to index everything.
>>
>> On 07 Jun 2016, at 13:51, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Ok So basically for predictive off-line (as opposed to streaming) in a
>> nutshell one can use Apache Flume to store twitter data in hdfs and use
>> Solr to query the data?
>>
>> This is what it says:
>>
>> Solr is a standalone enterprise search server with a REST-like API. You
>> put documents in it (called "indexing") via JSON, XML, CSV or binary over
>> HTTP. You query it via HTTP GET and receive JSON, XML, CSV or binary
>> results.
>>
>> thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 7 June 2016 at 12:39, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Well I have seen that The algorithms mentioned are used for this.
>>> However some preprocessing through solr makes sense - it takes care of
>>> synonyms, homonyms, stemming etc
>>>
>>> On 07 Jun 2016, at 13:33, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Thanks Jorn,
>>>
>>> To start I would like to explore how can one turn some of the data into
>>> useful information.
>>>
>>> I would like to look at certain trend analysis. Simple correlation shows
>>> that the more there is a mention of a typical topic say for example
>>> "organic food" the more people are inclined to go for it. To see one can
>>> deduce that orgaind food is a potential growth area.
>>>
>>> Now I have all infra-structure to ingest that data. Like using flume to
>>> store it or Spark streaming to do near real time work.
>>>
>>> Now I want to slice and dice that data for say organic food.
>>>
>>> I presume this is a typical question.
>>>
>>> You mentioned Spark ml (machine learning?) . Is that something viable?
>>>
>>> Cheers
>>>
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 7 June 2016 at 12:22, Jörn Franke <jornfra...@gmail.com> wrote:
>>>
>>>> Spark ml Support Vector machines or neural networks could be
>>>> candidates.
>>>> For unstructured learning it could be clustering.
>>>> For doing a graph analysis On the followers you can easily use Spark
>>>> Graphx
>>>> Keep in mind that each tweet contains a lot of meta data (location,
>>>> followers etc) that is more or less structured.
>>>> For unstructured text analytics (eg tweet itself)I recommend
>>>> solr/ElasticSearch .
>>>>
>>>> However I am not sure what you want to do with the data exactly.
>>>>
>>>>
>>>> On 07 Jun 2016, at 13:16, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This is really a general question.
>>>>
>>>> I use Spark to get twitter data. I did some looking at it
>>>>
>>>>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>>>>     val tweets = TwitterUtils.createStream(ssc, None)
>>>>     val statuses = tweets.map(status => status.getText())
>>>>     statuses.print()
>>>>
>>>> Ok
>>>>
>>>> Also I can use Apache flume to store data in hdfs directory
>>>>
>>>> $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf
>>>> Dflume.root.logger=DEBUG,console -n TwitterAgent
>>>> Now that stores twitter data in binary format in  hdfs directory.
>>>>
>>>> My question is pretty basic.
>>>>
>>>> What is the best tool/language to dif in to that data. For example
>>>> twitter streaming data. I am getting all sorts od stuff coming in. Say I am
>>>> only interested in certain topics like sport etc. How can I detect the
>>>> signal from the noise using what tool and language?
>>>>
>>>> Thanks
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to