subject:"Re\: Correct way to use spark streaming with apache zeppelin"

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread Skanda

Hi

Storing states/intermediate data in realtime processing depends on how much
throughput/latency your application requires. There are lot of technologies
that help you build this realtime datastore. Some examples include HBase,
Memsql, etc or in some cases an RDBMS like MySQL itself. This is a
judgement that you will have to make.

Regards,
Skanda

On Sun, Mar 13, 2016 at 11:23 PM, trung kien  wrote:

> Thanks all for actively sharing your experience.
>
> @Chris: using something like Redis is something I am trying to figure out.
> I have  a lots of transactions, so I couldn't trigger update event for
> every single transaction.
> I'm looking at Spark Streaming because it provide batch processing (e.g I
> can update the cache every 5 seconds). In addition Spark can scale pretty
> well and I don't have to worry about losing data.
>
> Now having the cache with following information:
>  * Date
>  * BranchID
>  * ProductID
>  TotalQty
>  TotalDollar
>
> * is key, note that I have history data as well (byday).
>
> Now I want to use zeppelin for querying again the cache (while the cache
> is updating).
> I don't need the Zeppelin update automatically (I can hit the run button
> myself :) )
> Just curious if parquet is the right solution for us?
>
>
>
> On Sun, Mar 13, 2016 at 3:25 PM, Chris Miller 
> wrote:
>
>> Cool! Thanks for sharing.
>>
>>
>> --
>> Chris Miller
>>
>> On Sun, Mar 13, 2016 at 12:53 AM, Todd Nist  wrote:
>>
>>> Below is a link to an example which Silvio Fiorito put together
>>> demonstrating how to link Zeppelin with Spark Stream for real-time charts.
>>> I think the original thread was pack in early November 2015, subject: Real
>>> time chart in Zeppelin, if you care to try to find it.
>>>
>>> https://gist.github.com/granturing/a09aed4a302a7367be92
>>>
>>> HTH.
>>>
>>> -Todd
>>>
>>> On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller 
>>> wrote:
>>>
 I'm pretty new to all of this stuff, so bare with me.

 Zeppelin isn't really intended for realtime dashboards as far as I
 know. Its reporting features (tables, graphs, etc.) are more for displaying
 the results from the output of something. As far as I know, there isn't
 really anything to "watch" a dataset and have updates pushed to the
 Zeppelin UI.

 As for Spark, unless you're doing a lot of processing that you didn't
 mention here, I don't think it's a good fit just for this.

 If it were me (just off the top of my head), I'd just build a simple
 web service that uses websockets to push updates to the client which could
 then be used to update graphs, tables, etc. The data itself -- that is, the
 accumulated totals -- you could store in something like Redis. When an
 order comes in, just add that quantity and price to the existing value and
 trigger your code to push out an updated value to any clients via the
 websocket. You could use something like a Redis pub/sub channel to trigger
 the web app to notify clients of an update.

 There are about 5 million other ways you could design this, but I would
 just keep it as simple as possible. I just threw one idea out...

 Good luck.

 --
 Chris Miller

 On Sat, Mar 12, 2016 at 6:58 PM, trung kien  wrote:

> Thanks Chris and Mich for replying.
>
> Sorry for not explaining my problem clearly.  Yes i am talking about a
> flexibke dashboard when mention Zeppelin.
>
> Here is the problem i am having:
>
> I am running a comercial website where we selle many products and we
> have many branchs in many place. We have a lots of realtime transactions
> and want to anaylyze it in realtime.
>
> We dont want every time doing analytics we have to aggregate every
> single transactions ( each transaction have BranchID, ProductID, Qty,
> Price). So, we maintain intermediate data which contains : BranchID,
> ProducrID, totalQty, totalDollar
>
> Ideally, we have 2 tables:
>Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
>
> And intermediate table Stats is just sum of every transaction group by
> BranchID and ProductID( i am using Sparkstreaming to calculate this table
> realtime)
>
> My thinking is that doing statistics ( realtime dashboard)  on Stats
> table is much easier, this table is also not enough for maintain.
>
> I'm just wondering, whats the best way to store Stats table( a
> database or parquet file?)
> What exactly are you trying to do? Zeppelin is for interactive
> analysis of a dataset. What do you mean "realtime analytics" -- do you 
> mean
> build a report or dashboard that automatically updates as new data comes 
> in?
>
>
> --
> Chris Miller
>
> On Sat, Mar 12, 2016 at 3:13 PM, trung kien 
> wrote:
>
>> Hi all,
>>
>> I've just viewe

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread trung kien

Thanks all for actively sharing your experience.

@Chris: using something like Redis is something I am trying to figure out.
I have  a lots of transactions, so I couldn't trigger update event for
every single transaction.
I'm looking at Spark Streaming because it provide batch processing (e.g I
can update the cache every 5 seconds). In addition Spark can scale pretty
well and I don't have to worry about losing data.

Now having the cache with following information:
 * Date
 * BranchID
 * ProductID
 TotalQty
 TotalDollar

* is key, note that I have history data as well (byday).

Now I want to use zeppelin for querying again the cache (while the cache is
updating).
I don't need the Zeppelin update automatically (I can hit the run button
myself :) )
Just curious if parquet is the right solution for us?

On Sun, Mar 13, 2016 at 3:25 PM, Chris Miller 
wrote:

> Cool! Thanks for sharing.
>
>
> --
> Chris Miller
>
> On Sun, Mar 13, 2016 at 12:53 AM, Todd Nist  wrote:
>
>> Below is a link to an example which Silvio Fiorito put together
>> demonstrating how to link Zeppelin with Spark Stream for real-time charts.
>> I think the original thread was pack in early November 2015, subject: Real
>> time chart in Zeppelin, if you care to try to find it.
>>
>> https://gist.github.com/granturing/a09aed4a302a7367be92
>>
>> HTH.
>>
>> -Todd
>>
>> On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller 
>> wrote:
>>
>>> I'm pretty new to all of this stuff, so bare with me.
>>>
>>> Zeppelin isn't really intended for realtime dashboards as far as I know.
>>> Its reporting features (tables, graphs, etc.) are more for displaying the
>>> results from the output of something. As far as I know, there isn't really
>>> anything to "watch" a dataset and have updates pushed to the Zeppelin UI.
>>>
>>> As for Spark, unless you're doing a lot of processing that you didn't
>>> mention here, I don't think it's a good fit just for this.
>>>
>>> If it were me (just off the top of my head), I'd just build a simple web
>>> service that uses websockets to push updates to the client which could then
>>> be used to update graphs, tables, etc. The data itself -- that is, the
>>> accumulated totals -- you could store in something like Redis. When an
>>> order comes in, just add that quantity and price to the existing value and
>>> trigger your code to push out an updated value to any clients via the
>>> websocket. You could use something like a Redis pub/sub channel to trigger
>>> the web app to notify clients of an update.
>>>
>>> There are about 5 million other ways you could design this, but I would
>>> just keep it as simple as possible. I just threw one idea out...
>>>
>>> Good luck.
>>>
>>>
>>> --
>>> Chris Miller
>>>
>>> On Sat, Mar 12, 2016 at 6:58 PM, trung kien  wrote:
>>>
 Thanks Chris and Mich for replying.

 Sorry for not explaining my problem clearly.  Yes i am talking about a
 flexibke dashboard when mention Zeppelin.

 Here is the problem i am having:

 I am running a comercial website where we selle many products and we
 have many branchs in many place. We have a lots of realtime transactions
 and want to anaylyze it in realtime.

 We dont want every time doing analytics we have to aggregate every
 single transactions ( each transaction have BranchID, ProductID, Qty,
 Price). So, we maintain intermediate data which contains : BranchID,
 ProducrID, totalQty, totalDollar

 Ideally, we have 2 tables:
Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)

 And intermediate table Stats is just sum of every transaction group by
 BranchID and ProductID( i am using Sparkstreaming to calculate this table
 realtime)

 My thinking is that doing statistics ( realtime dashboard)  on Stats
 table is much easier, this table is also not enough for maintain.

 I'm just wondering, whats the best way to store Stats table( a database
 or parquet file?)
 What exactly are you trying to do? Zeppelin is for interactive analysis
 of a dataset. What do you mean "realtime analytics" -- do you mean build a
 report or dashboard that automatically updates as new data comes in?

 --
 Chris Miller

 On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:

> Hi all,
>
> I've just viewed some Zeppenlin's videos. The intergration between
> Zeppenlin and Spark is really amazing and i want to use it for my
> application.
>
> In my app, i will have a Spark streaming app to do some basic realtime
> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
> realtime analytics on the intermediate data.
>
> My question is what's the most efficient storage engine to store
> realtime intermediate data? Is parquet file somewhere is suitable?
>

>>>
>>
>

-- 
Thanks
Kien

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread Chris Miller

Cool! Thanks for sharing.


--
Chris Miller

On Sun, Mar 13, 2016 at 12:53 AM, Todd Nist  wrote:

> Below is a link to an example which Silvio Fiorito put together
> demonstrating how to link Zeppelin with Spark Stream for real-time charts.
> I think the original thread was pack in early November 2015, subject: Real
> time chart in Zeppelin, if you care to try to find it.
>
> https://gist.github.com/granturing/a09aed4a302a7367be92
>
> HTH.
>
> -Todd
>
> On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller 
> wrote:
>
>> I'm pretty new to all of this stuff, so bare with me.
>>
>> Zeppelin isn't really intended for realtime dashboards as far as I know.
>> Its reporting features (tables, graphs, etc.) are more for displaying the
>> results from the output of something. As far as I know, there isn't really
>> anything to "watch" a dataset and have updates pushed to the Zeppelin UI.
>>
>> As for Spark, unless you're doing a lot of processing that you didn't
>> mention here, I don't think it's a good fit just for this.
>>
>> If it were me (just off the top of my head), I'd just build a simple web
>> service that uses websockets to push updates to the client which could then
>> be used to update graphs, tables, etc. The data itself -- that is, the
>> accumulated totals -- you could store in something like Redis. When an
>> order comes in, just add that quantity and price to the existing value and
>> trigger your code to push out an updated value to any clients via the
>> websocket. You could use something like a Redis pub/sub channel to trigger
>> the web app to notify clients of an update.
>>
>> There are about 5 million other ways you could design this, but I would
>> just keep it as simple as possible. I just threw one idea out...
>>
>> Good luck.
>>
>>
>> --
>> Chris Miller
>>
>> On Sat, Mar 12, 2016 at 6:58 PM, trung kien  wrote:
>>
>>> Thanks Chris and Mich for replying.
>>>
>>> Sorry for not explaining my problem clearly.  Yes i am talking about a
>>> flexibke dashboard when mention Zeppelin.
>>>
>>> Here is the problem i am having:
>>>
>>> I am running a comercial website where we selle many products and we
>>> have many branchs in many place. We have a lots of realtime transactions
>>> and want to anaylyze it in realtime.
>>>
>>> We dont want every time doing analytics we have to aggregate every
>>> single transactions ( each transaction have BranchID, ProductID, Qty,
>>> Price). So, we maintain intermediate data which contains : BranchID,
>>> ProducrID, totalQty, totalDollar
>>>
>>> Ideally, we have 2 tables:
>>>Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
>>>
>>> And intermediate table Stats is just sum of every transaction group by
>>> BranchID and ProductID( i am using Sparkstreaming to calculate this table
>>> realtime)
>>>
>>> My thinking is that doing statistics ( realtime dashboard)  on Stats
>>> table is much easier, this table is also not enough for maintain.
>>>
>>> I'm just wondering, whats the best way to store Stats table( a database
>>> or parquet file?)
>>> What exactly are you trying to do? Zeppelin is for interactive analysis
>>> of a dataset. What do you mean "realtime analytics" -- do you mean build a
>>> report or dashboard that automatically updates as new data comes in?
>>>
>>>
>>> --
>>> Chris Miller
>>>
>>> On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:
>>>
 Hi all,

 I've just viewed some Zeppenlin's videos. The intergration between
 Zeppenlin and Spark is really amazing and i want to use it for my
 application.

 In my app, i will have a Spark streaming app to do some basic realtime
 aggregation ( intermediate data). Then i want to use Zeppenlin to do some
 realtime analytics on the intermediate data.

 My question is what's the most efficient storage engine to store
 realtime intermediate data? Is parquet file somewhere is suitable?

>>>
>>>
>>
>

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread Mich Talebzadeh

Certainly the only graphs that I can produce are from the SQL queries on
base tables. That basically means that data has to be stored in a permanent
tables so temporary tables in Spark cannot be used (?), Additionally it
only seems to  work on SQL only (and I have not seen any presentation using
anything else). So it boils down take the data out and display it some
chart format. The scatter plot does not work and ends up in freezing the
notebook.

Having said that I am not sure a polling graph that the thread owner is
looking can be achieved with this. I don't think it will working for
streaming data like pricing update etc. For some presentations it is useful.

Here is an example

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 12 March 2016 at 16:53, Todd Nist  wrote:

> Below is a link to an example which Silvio Fiorito put together
> demonstrating how to link Zeppelin with Spark Stream for real-time charts.
> I think the original thread was pack in early November 2015, subject: Real
> time chart in Zeppelin, if you care to try to find it.
>
> https://gist.github.com/granturing/a09aed4a302a7367be92
>
> HTH.
>
> -Todd
>
> On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller 
> wrote:
>
>> I'm pretty new to all of this stuff, so bare with me.
>>
>> Zeppelin isn't really intended for realtime dashboards as far as I know.
>> Its reporting features (tables, graphs, etc.) are more for displaying the
>> results from the output of something. As far as I know, there isn't really
>> anything to "watch" a dataset and have updates pushed to the Zeppelin UI.
>>
>> As for Spark, unless you're doing a lot of processing that you didn't
>> mention here, I don't think it's a good fit just for this.
>>
>> If it were me (just off the top of my head), I'd just build a simple web
>> service that uses websockets to push updates to the client which could then
>> be used to update graphs, tables, etc. The data itself -- that is, the
>> accumulated totals -- you could store in something like Redis. When an
>> order comes in, just add that quantity and price to the existing value and
>> trigger your code to push out an updated value to any clients via the
>> websocket. You could use something like a Redis pub/sub channel to trigger
>> the web app to notify clients of an update.
>>
>> There are about 5 million other ways you could design this, but I would
>> just keep it as simple as possible. I just threw one idea out...
>>
>> Good luck.
>>
>>
>> --
>> Chris Miller
>>
>> On Sat, Mar 12, 2016 at 6:58 PM, trung kien  wrote:
>>
>>> Thanks Chris and Mich for replying.
>>>
>>> Sorry for not explaining my problem clearly.  Yes i am talking about a
>>> flexibke dashboard when mention Zeppelin.
>>>
>>> Here is the problem i am having:
>>>
>>> I am running a comercial website where we selle many products and we
>>> have many branchs in many place. We have a lots of realtime transactions
>>> and want to anaylyze it in realtime.
>>>
>>> We dont want every time doing analytics we have to aggregate every
>>> single transactions ( each transaction have BranchID, ProductID, Qty,
>>> Price). So, we maintain intermediate data which contains : BranchID,
>>> ProducrID, totalQty, totalDollar
>>>
>>> Ideally, we have 2 tables:
>>>Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
>>>
>>> And intermediate table Stats is just sum of every transaction group by
>>> BranchID and ProductID( i am using Sparkstreaming to calculate this table
>>> realtime)
>>>
>>> My thinking is that doing statistics ( realtime dashboard)  on Stats
>>> table is much easier, this table is also not enough for maintain.
>>>
>>> I'm just wondering, whats the best way to store Stats table( a database
>>> or parquet file?)
>>> What exactly are you trying to do? Zeppelin is for interactive analysis
>>> of a dataset. What do you mean "realtime analytics" -- do you mean build a
>>> report or dashboard that automatically updates as new data comes in?
>>>
>>>
>>> --
>>> Chris Miller
>>>
>>> On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:
>>>
 Hi all,

 I've just viewed some Zeppenlin's videos. The intergration between
 Zeppenlin and Spark is really amazing and i want to use it for my
 application.

 In my app, i will have a Spark streaming app to do some basic realtime
 aggregation ( intermediate data). Then i want to use Zeppenlin to do some
 realtime analytics on the intermediate data.

 My question is what's the most efficient storage engine to store
 realtime intermediate data? Is parquet file somewhere is suitable?

>>>
>>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread Todd Nist

Below is a link to an example which Silvio Fiorito put together
demonstrating how to link Zeppelin with Spark Stream for real-time charts.
I think the original thread was pack in early November 2015, subject: Real
time chart in Zeppelin, if you care to try to find it.

https://gist.github.com/granturing/a09aed4a302a7367be92

HTH.

-Todd

On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller 
wrote:

> I'm pretty new to all of this stuff, so bare with me.
>
> Zeppelin isn't really intended for realtime dashboards as far as I know.
> Its reporting features (tables, graphs, etc.) are more for displaying the
> results from the output of something. As far as I know, there isn't really
> anything to "watch" a dataset and have updates pushed to the Zeppelin UI.
>
> As for Spark, unless you're doing a lot of processing that you didn't
> mention here, I don't think it's a good fit just for this.
>
> If it were me (just off the top of my head), I'd just build a simple web
> service that uses websockets to push updates to the client which could then
> be used to update graphs, tables, etc. The data itself -- that is, the
> accumulated totals -- you could store in something like Redis. When an
> order comes in, just add that quantity and price to the existing value and
> trigger your code to push out an updated value to any clients via the
> websocket. You could use something like a Redis pub/sub channel to trigger
> the web app to notify clients of an update.
>
> There are about 5 million other ways you could design this, but I would
> just keep it as simple as possible. I just threw one idea out...
>
> Good luck.
>
>
> --
> Chris Miller
>
> On Sat, Mar 12, 2016 at 6:58 PM, trung kien  wrote:
>
>> Thanks Chris and Mich for replying.
>>
>> Sorry for not explaining my problem clearly.  Yes i am talking about a
>> flexibke dashboard when mention Zeppelin.
>>
>> Here is the problem i am having:
>>
>> I am running a comercial website where we selle many products and we have
>> many branchs in many place. We have a lots of realtime transactions and
>> want to anaylyze it in realtime.
>>
>> We dont want every time doing analytics we have to aggregate every single
>> transactions ( each transaction have BranchID, ProductID, Qty, Price). So,
>> we maintain intermediate data which contains : BranchID, ProducrID,
>> totalQty, totalDollar
>>
>> Ideally, we have 2 tables:
>>Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
>>
>> And intermediate table Stats is just sum of every transaction group by
>> BranchID and ProductID( i am using Sparkstreaming to calculate this table
>> realtime)
>>
>> My thinking is that doing statistics ( realtime dashboard)  on Stats
>> table is much easier, this table is also not enough for maintain.
>>
>> I'm just wondering, whats the best way to store Stats table( a database
>> or parquet file?)
>> What exactly are you trying to do? Zeppelin is for interactive analysis
>> of a dataset. What do you mean "realtime analytics" -- do you mean build a
>> report or dashboard that automatically updates as new data comes in?
>>
>>
>> --
>> Chris Miller
>>
>> On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:
>>
>>> Hi all,
>>>
>>> I've just viewed some Zeppenlin's videos. The intergration between
>>> Zeppenlin and Spark is really amazing and i want to use it for my
>>> application.
>>>
>>> In my app, i will have a Spark streaming app to do some basic realtime
>>> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
>>> realtime analytics on the intermediate data.
>>>
>>> My question is what's the most efficient storage engine to store
>>> realtime intermediate data? Is parquet file somewhere is suitable?
>>>
>>
>>
>

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread Chris Miller

I'm pretty new to all of this stuff, so bare with me.

Zeppelin isn't really intended for realtime dashboards as far as I know.
Its reporting features (tables, graphs, etc.) are more for displaying the
results from the output of something. As far as I know, there isn't really
anything to "watch" a dataset and have updates pushed to the Zeppelin UI.

As for Spark, unless you're doing a lot of processing that you didn't
mention here, I don't think it's a good fit just for this.

If it were me (just off the top of my head), I'd just build a simple web
service that uses websockets to push updates to the client which could then
be used to update graphs, tables, etc. The data itself -- that is, the
accumulated totals -- you could store in something like Redis. When an
order comes in, just add that quantity and price to the existing value and
trigger your code to push out an updated value to any clients via the
websocket. You could use something like a Redis pub/sub channel to trigger
the web app to notify clients of an update.

There are about 5 million other ways you could design this, but I would
just keep it as simple as possible. I just threw one idea out...

Good luck.

--
Chris Miller

On Sat, Mar 12, 2016 at 6:58 PM, trung kien  wrote:

> Thanks Chris and Mich for replying.
>
> Sorry for not explaining my problem clearly.  Yes i am talking about a
> flexibke dashboard when mention Zeppelin.
>
> Here is the problem i am having:
>
> I am running a comercial website where we selle many products and we have
> many branchs in many place. We have a lots of realtime transactions and
> want to anaylyze it in realtime.
>
> We dont want every time doing analytics we have to aggregate every single
> transactions ( each transaction have BranchID, ProductID, Qty, Price). So,
> we maintain intermediate data which contains : BranchID, ProducrID,
> totalQty, totalDollar
>
> Ideally, we have 2 tables:
>Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
>
> And intermediate table Stats is just sum of every transaction group by
> BranchID and ProductID( i am using Sparkstreaming to calculate this table
> realtime)
>
> My thinking is that doing statistics ( realtime dashboard)  on Stats table
> is much easier, this table is also not enough for maintain.
>
> I'm just wondering, whats the best way to store Stats table( a database or
> parquet file?)
> What exactly are you trying to do? Zeppelin is for interactive analysis of
> a dataset. What do you mean "realtime analytics" -- do you mean build a
> report or dashboard that automatically updates as new data comes in?
>
>
> --
> Chris Miller
>
> On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:
>
>> Hi all,
>>
>> I've just viewed some Zeppenlin's videos. The intergration between
>> Zeppenlin and Spark is really amazing and i want to use it for my
>> application.
>>
>> In my app, i will have a Spark streaming app to do some basic realtime
>> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
>> realtime analytics on the intermediate data.
>>
>> My question is what's the most efficient storage engine to store realtime
>> intermediate data? Is parquet file somewhere is suitable?
>>
>
>

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread trung kien

Thanks Chris and Mich for replying.

Sorry for not explaining my problem clearly.  Yes i am talking about a
flexibke dashboard when mention Zeppelin.

Here is the problem i am having:

I am running a comercial website where we selle many products and we have
many branchs in many place. We have a lots of realtime transactions and
want to anaylyze it in realtime.

We dont want every time doing analytics we have to aggregate every single
transactions ( each transaction have BranchID, ProductID, Qty, Price). So,
we maintain intermediate data which contains : BranchID, ProducrID,
totalQty, totalDollar

Ideally, we have 2 tables:
   Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)

And intermediate table Stats is just sum of every transaction group by
BranchID and ProductID( i am using Sparkstreaming to calculate this table
realtime)

My thinking is that doing statistics ( realtime dashboard)  on Stats table
is much easier, this table is also not enough for maintain.

I'm just wondering, whats the best way to store Stats table( a database or
parquet file?)
What exactly are you trying to do? Zeppelin is for interactive analysis of
a dataset. What do you mean "realtime analytics" -- do you mean build a
report or dashboard that automatically updates as new data comes in?

--
Chris Miller

On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:

> Hi all,
>
> I've just viewed some Zeppenlin's videos. The intergration between
> Zeppenlin and Spark is really amazing and i want to use it for my
> application.
>
> In my app, i will have a Spark streaming app to do some basic realtime
> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
> realtime analytics on the intermediate data.
>
> My question is what's the most efficient storage engine to store realtime
> intermediate data? Is parquet file somewhere is suitable?
>

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread Chris Miller

What exactly are you trying to do? Zeppelin is for interactive analysis of
a dataset. What do you mean "realtime analytics" -- do you mean build a
report or dashboard that automatically updates as new data comes in?

--
Chris Miller

On Sat, Mar 12, 2016 at 3:13 PM, trung kien  wrote:

> Hi all,
>
> I've just viewed some Zeppenlin's videos. The intergration between
> Zeppenlin and Spark is really amazing and i want to use it for my
> application.
>
> In my app, i will have a Spark streaming app to do some basic realtime
> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
> realtime analytics on the intermediate data.
>
> My question is what's the most efficient storage engine to store realtime
> intermediate data? Is parquet file somewhere is suitable?
>

Re: Correct way to use spark streaming with apache zeppelin

2016-03-11 Thread Mich Talebzadeh

Hi,

I use Zeppelin as well and in the notebook mode you can do analytics much
like what you do in Spark-shell.

You can store your intermediate data in Parquet if you wish and then
analyse data the way you like.

What is your use case here? Zeppelin as I use it is a web UI to your
spark-shell, accessible from anywhere.

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*

http://talebzadehmich.wordpress.com

On 12 March 2016 at 07:13, trung kien  wrote:

> Hi all,
>
> I've just viewed some Zeppenlin's videos. The intergration between
> Zeppenlin and Spark is really amazing and i want to use it for my
> application.
>
> In my app, i will have a Spark streaming app to do some basic realtime
> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
> realtime analytics on the intermediate data.
>
> My question is what's the most efficient storage engine to store realtime
> intermediate data? Is parquet file somewhere is suitable?
>

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

Re: Correct way to use spark streaming with apache zeppelin

9 matches

Site Navigation

Mail list logo

Footer information