For ui , you need DB such as Cassandra that is designed to work around queries . Ingest the data to spark streaming (speed layer) and write to hdfs(for batch layer). Now you have data at rest as well as in motion(real time). >From spark streaming itself , do further processing and write the final result to Cassandra/nosql DB. UI can pick the data from the DB now.
Thanks Deepak On Thu, Sep 29, 2016 at 8:00 PM, Alonso Isidoro Roman <alons...@gmail.com> wrote: > "Using Spark to query the data in the backend of the web UI?" > > Dont do that. I would recommend that spark streaming process stores data > into some nosql or sql database and the web ui to query data from that > database. > > Alonso Isidoro Roman > [image: https://]about.me/alonso.isidoro.roman > > <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> > > 2016-09-29 16:15 GMT+02:00 Ali Akhtar <ali.rac...@gmail.com>: > >> The web UI is actually the speed layer, it needs to be able to query the >> data online, and show the results in real-time. >> >> It also needs a custom front-end, so a system like Tableau can't be used, >> it must have a custom backend + front-end. >> >> Thanks for the recommendation of Flume. Do you think this will work: >> >> - Spark Streaming to read data from Kafka >> - Storing the data on HDFS using Flume >> - Using Spark to query the data in the backend of the web UI? >> >> >> >> On Thu, Sep 29, 2016 at 7:08 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> You need a batch layer and a speed layer. Data from Kafka can be stored >>> on HDFS using flume. >>> >>> - Query this data to generate reports / analytics (There will be a web >>> UI which will be the front-end to the data, and will show the reports) >>> >>> This is basically batch layer and you need something like Tableau or >>> Zeppelin to query data >>> >>> You will also need spark streaming to query data online for speed layer. >>> That data could be stored in some transient fabric like ignite or even >>> druid. >>> >>> HTH >>> >>> >>> >>> >>> >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 29 September 2016 at 15:01, Ali Akhtar <ali.rac...@gmail.com> wrote: >>> >>>> It needs to be able to scale to a very large amount of data, yes. >>>> >>>> On Thu, Sep 29, 2016 at 7:00 PM, Deepak Sharma <deepakmc...@gmail.com> >>>> wrote: >>>> >>>>> What is the message inflow ? >>>>> If it's really high , definitely spark will be of great use . >>>>> >>>>> Thanks >>>>> Deepak >>>>> >>>>> On Sep 29, 2016 19:24, "Ali Akhtar" <ali.rac...@gmail.com> wrote: >>>>> >>>>>> I have a somewhat tricky use case, and I'm looking for ideas. >>>>>> >>>>>> I have 5-6 Kafka producers, reading various APIs, and writing their >>>>>> raw data into Kafka. >>>>>> >>>>>> I need to: >>>>>> >>>>>> - Do ETL on the data, and standardize it. >>>>>> >>>>>> - Store the standardized data somewhere (HBase / Cassandra / Raw HDFS >>>>>> / ElasticSearch / Postgres) >>>>>> >>>>>> - Query this data to generate reports / analytics (There will be a >>>>>> web UI which will be the front-end to the data, and will show the >>>>>> reports) >>>>>> >>>>>> Java is being used as the backend language for everything (backend of >>>>>> the web UI, as well as the ETL layer) >>>>>> >>>>>> I'm considering: >>>>>> >>>>>> - Using raw Kafka consumers, or Spark Streaming, as the ETL layer >>>>>> (receive raw data from Kafka, standardize & store it) >>>>>> >>>>>> - Using Cassandra, HBase, or raw HDFS, for storing the standardized >>>>>> data, and to allow queries >>>>>> >>>>>> - In the backend of the web UI, I could either use Spark to run >>>>>> queries across the data (mostly filters), or directly run queries against >>>>>> Cassandra / HBase >>>>>> >>>>>> I'd appreciate some thoughts / suggestions on which of these >>>>>> alternatives I should go with (e.g, using raw Kafka consumers vs Spark >>>>>> for >>>>>> ETL, which persistent data store to use, and how to query that data store >>>>>> in the backend of the web UI, for displaying the reports). >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>> >>>> >>> >> > -- Thanks Deepak www.bigdatabig.com www.keosha.net