Re: Architecture recommendations for a tricky use case

Deepak Sharma Thu, 29 Sep 2016 07:34:51 -0700

For ui , you need DB such as Cassandra that is designed to work around
queries .
Ingest the data to spark streaming (speed layer) and write to hdfs(for
batch layer).
Now you have data at rest as well as in motion(real time).
>From spark streaming itself , do further processing and write the final
result to Cassandra/nosql DB.
UI can pick the data from the DB now.


Thanks
Deepak

On Thu, Sep 29, 2016 at 8:00 PM, Alonso Isidoro Roman <alons...@gmail.com>
wrote:

> "Using Spark to query the data in the backend of the web UI?"
>
> Dont do that. I would recommend that spark streaming process stores data
> into some nosql or sql database and the web ui to query data from that
> database.
>
> Alonso Isidoro Roman
> [image: https://]about.me/alonso.isidoro.roman
>
> <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>
>
> 2016-09-29 16:15 GMT+02:00 Ali Akhtar <ali.rac...@gmail.com>:
>
>> The web UI is actually the speed layer, it needs to be able to query the
>> data online, and show the results in real-time.
>>
>> It also needs a custom front-end, so a system like Tableau can't be used,
>> it must have a custom backend + front-end.
>>
>> Thanks for the recommendation of Flume. Do you think this will work:
>>
>> - Spark Streaming to read data from Kafka
>> - Storing the data on HDFS using Flume
>> - Using Spark to query the data in the backend of the web UI?
>>
>>
>>
>> On Thu, Sep 29, 2016 at 7:08 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> You need a batch layer and a speed layer. Data from Kafka can be stored
>>> on HDFS using flume.
>>>
>>> -  Query this data to generate reports / analytics (There will be a web
>>> UI which will be the front-end to the data, and will show the reports)
>>>
>>> This is basically batch layer and you need something like Tableau or
>>> Zeppelin to query data
>>>
>>> You will also need spark streaming to query data online for speed layer.
>>> That data could be stored in some transient fabric like ignite or even
>>> druid.
>>>
>>> HTH
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 29 September 2016 at 15:01, Ali Akhtar <ali.rac...@gmail.com> wrote:
>>>
>>>> It needs to be able to scale to a very large amount of data, yes.
>>>>
>>>> On Thu, Sep 29, 2016 at 7:00 PM, Deepak Sharma <deepakmc...@gmail.com>
>>>> wrote:
>>>>
>>>>> What is the message inflow ?
>>>>> If it's really high , definitely spark will be of great use .
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>> On Sep 29, 2016 19:24, "Ali Akhtar" <ali.rac...@gmail.com> wrote:
>>>>>
>>>>>> I have a somewhat tricky use case, and I'm looking for ideas.
>>>>>>
>>>>>> I have 5-6 Kafka producers, reading various APIs, and writing their
>>>>>> raw data into Kafka.
>>>>>>
>>>>>> I need to:
>>>>>>
>>>>>> - Do ETL on the data, and standardize it.
>>>>>>
>>>>>> - Store the standardized data somewhere (HBase / Cassandra / Raw HDFS
>>>>>> / ElasticSearch / Postgres)
>>>>>>
>>>>>> - Query this data to generate reports / analytics (There will be a
>>>>>> web UI which will be the front-end to the data, and will show the 
>>>>>> reports)
>>>>>>
>>>>>> Java is being used as the backend language for everything (backend of
>>>>>> the web UI, as well as the ETL layer)
>>>>>>
>>>>>> I'm considering:
>>>>>>
>>>>>> - Using raw Kafka consumers, or Spark Streaming, as the ETL layer
>>>>>> (receive raw data from Kafka, standardize & store it)
>>>>>>
>>>>>> - Using Cassandra, HBase, or raw HDFS, for storing the standardized
>>>>>> data, and to allow queries
>>>>>>
>>>>>> - In the backend of the web UI, I could either use Spark to run
>>>>>> queries across the data (mostly filters), or directly run queries against
>>>>>> Cassandra / HBase
>>>>>>
>>>>>> I'd appreciate some thoughts / suggestions on which of these
>>>>>> alternatives I should go with (e.g, using raw Kafka consumers vs Spark 
>>>>>> for
>>>>>> ETL, which persistent data store to use, and how to query that data store
>>>>>> in the backend of the web UI, for displaying the reports).
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Re: Architecture recommendations for a tricky use case

Reply via email to