Re: quick question: best to use cluster mode or client mode for production?

2017-02-23 Thread Sam Elamin
I personally use spark submit as it's agnostic to which platform your spark
clusters are working on e.g. Emr dataproc databricks etc


On Thu, 23 Feb 2017 at 08:53, nancy henry  wrote:

> Hi Team,
>
> I have set of hc.sql("hivequery") kind of scripts which i am running right
> now in spark-shell
>
> How should i schedule it in production
> making it spark-shell -i script.scala
> or keeping it in jar file through eclipse and use spark-submit deploy mode
> cluster?
>
> which is advisable?
>


quick question: best to use cluster mode or client mode for production?

2017-02-23 Thread nancy henry
Hi Team,

I have set of hc.sql("hivequery") kind of scripts which i am running right
now in spark-shell

How should i schedule it in production
making it spark-shell -i script.scala
or keeping it in jar file through eclipse and use spark-submit deploy mode
cluster?

which is advisable?


quick question

2016-12-01 Thread kant kodali
Assume I am running a Spark Client Program in client mode and Spark Cluster
in Stand alone mode.

I want some clarification of the following things

1. Build a DAG
2. DAG Scheduler
3. TASK Scheduler

I want to which of the above part is done by SPARK CLIENT and which of the
above parts are done by SPARK MASTER in the stand alone case?

Building a DAG clearly looks like Spark Client Program
DAG Scheduler is also in the Spark Client Program
Task Scheduler is done by the SPARK MASTER.

is this correct? Also, Does Spark Client every instruct Spark Workers
directly on what transformations to run or the communication is just
unidirectional in the sense that Spark Workers communicate to Spark client
only when returning the results ?

thanks!


Re: quick question

2016-08-25 Thread kant kodali
s (based on your programming
language preferences).
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API

Hope this helps! - Kevin
On Wed, Aug 24, 2016 at 3:52 PM, kant kodali < kanth...@gmail.com > wrote:

-- Forwarded message --
From: kant kodali < kanth...@gmail.com >
Date: Wed, Aug 24, 2016 at 1:49 PM
Subject: quick question
To: d...@spark.apache.org , us...@spark.apache.org



In this picture what does "Dashboards" really mean? is there a open source
project which can allow me to push the results back to Dashboards such that
Dashboards are always in sync with real time updates? (a push based solution is
better than poll but i am open to whatever is possible given the above picture)

Re: quick question

2016-08-25 Thread Sivakumaran S
:siva.kuma...@me.com> wrote:
>>> You create a websocket object in your spark code and write your data to the 
>>> socket. You create a websocket object in your dashboard code and receive 
>>> the data in realtime and update the dashboard. You can use Node.js in your 
>>> dashboard (socket.io <http://socket.io/>). I am sure there are other ways 
>>> too.
>>> 
>>> Does that help?
>>> 
>>> Sivakumaran S
>>> 
>>>> On 25-Aug-2016, at 6:30 AM, kant kodali <kanth...@gmail.com 
>>>> <mailto:kanth...@gmail.com>> wrote:
>>>> 
>>>> so I would need to open a websocket connection from spark worker machine 
>>>> to where?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com 
>>>> <mailto:kevin.r.mell...@gmail.com> wrote:
>>>> In the diagram you referenced, a real-time dashboard can be created using 
>>>> WebSockets. This technology essentially allows your web page to keep an 
>>>> active line of communication between the client and server, in which case 
>>>> you can detect and display new information without requiring any user 
>>>> input of page refreshes. The link below contains additional information on 
>>>> this concept, as well as links to several different implementations (based 
>>>> on your programming language preferences).
>>>> 
>>>> https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API 
>>>> <https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API>
>>>> 
>>>> Hope this helps!
>>>> - Kevin
>>>> 
>>>> On Wed, Aug 24, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com 
>>>> <mailto:kanth...@gmail.com>> wrote:
>>>> 
>>>> -- Forwarded message --
>>>> From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
>>>> Date: Wed, Aug 24, 2016 at 1:49 PM
>>>> Subject: quick question
>>>> To: d...@spark.apache.org <mailto:d...@spark.apache.org>, 
>>>> us...@spark.apache.org <mailto:us...@spark.apache.org>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> In this picture what does "Dashboards" really mean? is there a open source 
>>>> project which can allow me to push the results back to Dashboards such 
>>>> that Dashboards are always in sync with real time updates? (a push based 
>>>> solution is better than poll but i am open to whatever is possible given 
>>>> the above picture)
>>>> 
>>> 
>> 
>> 
> 
> 



Re: quick question

2016-08-25 Thread kant kodali
Your assumption is right (thats what I intend to do). My driver code will be in
Java. The link sent by Kevin is a API reference to websocket. I understand how
websockets works in general but my question was more geared towards seeing the
end to end path on how front end dashboard gets updated in realtime. when we
collect the data back to the driver program and finished writing data to
websocket client the websocket connection terminate right so
1) is Spark driver program something that needs to run for ever like a typical
server? if not, 2) then do we need to open a web socket connection each time 
when the task
terminates?





On Thu, Aug 25, 2016 6:06 AM, Sivakumaran S siva.kuma...@me.com wrote:
I am assuming that you are doing some calculations over a time window. At the
end of the calculations (using RDDs or SQL), once you have collected the data
back to the driver program, you format the data in the way your client
(dashboard) requires it and write it to the websocket.
Is your driver code in Python? The link Kevin has sent should start you off.
Regards,
Sivakumaran
On 25-Aug-2016, at 11:53 AM, kant kodali < kanth...@gmail.com > wrote:
yes for now it will be Spark Streaming Job but later it may change.





On Thu, Aug 25, 2016 2:37 AM, Sivakumaran S siva.kuma...@me.com wrote:
Is this a Spark Streaming job?
Regards,
Sivakumaran S

@Sivakumaran when you say create a web socket object in your spark code I assume
you meant a spark "task" opening websocket connection from one of the worker 
machines to some node.js server in that case
the websocket connection terminates after the spark task is completed right ? 
and when new data comes in a new task gets created
and opens a new websocket connection again…is that how it should be
On 25-Aug-2016, at 7:08 AM, kant kodali < kanth...@gmail.com > wrote:
@Sivakumaran when you say create a web socket object in your spark code I assume
you meant a spark "task" opening websocket connection from one of the worker
machines to some node.js server in that case the websocket connection terminates
after the spark task is completed right ? and when new data comes in a new task
gets created and opens a new websocket connection again…is that how it should
be?





On Wed, Aug 24, 2016 10:38 PM, Sivakumaran S siva.kuma...@me.com wrote:
You create a websocket object in your spark code and write your data to the
socket. You create a websocket object in your dashboard code and receive the
data in realtime and update the dashboard. You can use Node.js in your dashboard
( socket.io ). I am sure there are other ways too.
Does that help?
Sivakumaran S
On 25-Aug-2016, at 6:30 AM, kant kodali < kanth...@gmail.com > wrote:
so I would need to open a websocket connection from spark worker machine to
where?





On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com wrote:
In the diagram you referenced, a real-time dashboard can be created using
WebSockets. This technology essentially allows your web page to keep an active
line of communication between the client and server, in which case you can
detect and display new information without requiring any user input of page
refreshes. The link below contains additional information on this concept, as
well as links to several different implementations (based on your programming
language preferences).
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API

Hope this helps! - Kevin
On Wed, Aug 24, 2016 at 3:52 PM, kant kodali < kanth...@gmail.com > wrote:

-- Forwarded message --
From: kant kodali < kanth...@gmail.com >
Date: Wed, Aug 24, 2016 at 1:49 PM
Subject: quick question
To: d...@spark.apache.org , us...@spark.apache.org



In this picture what does "Dashboards" really mean? is there a open source
project which can allow me to push the results back to Dashboards such that
Dashboards are always in sync with real time updates? (a push based solution is
better than poll but i am open to whatever is possible given the above picture)

Re: quick question

2016-08-25 Thread Sivakumaran S
I am assuming that you are doing some calculations over a time window. At the 
end of the calculations (using RDDs or SQL), once you have collected the data 
back to the driver program, you format the data in the way your client 
(dashboard) requires it and write it to the websocket. 

Is your driver code in Python? The link Kevin has sent should start you off.

Regards,

Sivakumaran 
> On 25-Aug-2016, at 11:53 AM, kant kodali <kanth...@gmail.com> wrote:
> 
> yes for now it will be Spark Streaming Job but later it may change.
> 
> 
> 
> 
> 
> On Thu, Aug 25, 2016 2:37 AM, Sivakumaran S siva.kuma...@me.com 
> <mailto:siva.kuma...@me.com> wrote:
> Is this a Spark Streaming job?
> 
> Regards,
> 
> Sivakumaran S
> 
> 
>> @Sivakumaran when you say create a web socket object in your spark code I 
>> assume you meant a spark "task" opening websocket 
>> connection from one of the worker machines to some node.js server in that 
>> case the websocket connection terminates after the spark 
>> task is completed right ? and when new data comes in a new task gets created 
>> and opens a new websocket connection again…is that how it should be
> 
>> On 25-Aug-2016, at 7:08 AM, kant kodali <kanth...@gmail.com 
>> <mailto:kanth...@gmail.com>> wrote:
>> 
>> @Sivakumaran when you say create a web socket object in your spark code I 
>> assume you meant a spark "task" opening websocket connection from one of the 
>> worker machines to some node.js server in that case the websocket connection 
>> terminates after the spark task is completed right ? and when new data comes 
>> in a new task gets created and opens a new websocket connection again…is 
>> that how it should be?
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 24, 2016 10:38 PM, Sivakumaran S siva.kuma...@me.com 
>> <mailto:siva.kuma...@me.com> wrote:
>> You create a websocket object in your spark code and write your data to the 
>> socket. You create a websocket object in your dashboard code and receive the 
>> data in realtime and update the dashboard. You can use Node.js in your 
>> dashboard (socket.io <http://socket.io/>). I am sure there are other ways 
>> too.
>> 
>> Does that help?
>> 
>> Sivakumaran S
>> 
>>> On 25-Aug-2016, at 6:30 AM, kant kodali <kanth...@gmail.com 
>>> <mailto:kanth...@gmail.com>> wrote:
>>> 
>>> so I would need to open a websocket connection from spark worker machine to 
>>> where?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com 
>>> <mailto:kevin.r.mell...@gmail.com> wrote:
>>> In the diagram you referenced, a real-time dashboard can be created using 
>>> WebSockets. This technology essentially allows your web page to keep an 
>>> active line of communication between the client and server, in which case 
>>> you can detect and display new information without requiring any user input 
>>> of page refreshes. The link below contains additional information on this 
>>> concept, as well as links to several different implementations (based on 
>>> your programming language preferences).
>>> 
>>> https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API 
>>> <https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API>
>>> 
>>> Hope this helps!
>>> - Kevin
>>> 
>>> On Wed, Aug 24, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com 
>>> <mailto:kanth...@gmail.com>> wrote:
>>> 
>>> -- Forwarded message --
>>> From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
>>> Date: Wed, Aug 24, 2016 at 1:49 PM
>>> Subject: quick question
>>> To: d...@spark.apache.org <mailto:d...@spark.apache.org>, 
>>> us...@spark.apache.org <mailto:us...@spark.apache.org>
>>> 
>>> 
>>> 
>>> 
>>> In this picture what does "Dashboards" really mean? is there a open source 
>>> project which can allow me to push the results back to Dashboards such that 
>>> Dashboards are always in sync with real time updates? (a push based 
>>> solution is better than poll but i am open to whatever is possible given 
>>> the above picture)
>>> 
>> 
> 
> 



Re: quick question

2016-08-25 Thread kant kodali
yes for now it will be Spark Streaming Job but later it may change.





On Thu, Aug 25, 2016 2:37 AM, Sivakumaran S siva.kuma...@me.com wrote:
Is this a Spark Streaming job?
Regards,
Sivakumaran S

@Sivakumaran when you say create a web socket object in your spark code I assume
you meant a spark "task" opening websocket connection from one of the worker 
machines to some node.js server in that case
the websocket connection terminates after the spark task is completed right ? 
and when new data comes in a new task gets created
and opens a new websocket connection again…is that how it should be
On 25-Aug-2016, at 7:08 AM, kant kodali < kanth...@gmail.com > wrote:
@Sivakumaran when you say create a web socket object in your spark code I assume
you meant a spark "task" opening websocket connection from one of the worker
machines to some node.js server in that case the websocket connection terminates
after the spark task is completed right ? and when new data comes in a new task
gets created and opens a new websocket connection again…is that how it should
be?





On Wed, Aug 24, 2016 10:38 PM, Sivakumaran S siva.kuma...@me.com wrote:
You create a websocket object in your spark code and write your data to the
socket. You create a websocket object in your dashboard code and receive the
data in realtime and update the dashboard. You can use Node.js in your dashboard
( socket.io ). I am sure there are other ways too.
Does that help?
Sivakumaran S
On 25-Aug-2016, at 6:30 AM, kant kodali < kanth...@gmail.com > wrote:
so I would need to open a websocket connection from spark worker machine to
where?





On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com wrote:
In the diagram you referenced, a real-time dashboard can be created using
WebSockets. This technology essentially allows your web page to keep an active
line of communication between the client and server, in which case you can
detect and display new information without requiring any user input of page
refreshes. The link below contains additional information on this concept, as
well as links to several different implementations (based on your programming
language preferences).
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API

Hope this helps! - Kevin
On Wed, Aug 24, 2016 at 3:52 PM, kant kodali < kanth...@gmail.com > wrote:

-- Forwarded message --
From: kant kodali < kanth...@gmail.com >
Date: Wed, Aug 24, 2016 at 1:49 PM
Subject: quick question
To: d...@spark.apache.org , us...@spark.apache.org



In this picture what does "Dashboards" really mean? is there a open source
project which can allow me to push the results back to Dashboards such that
Dashboards are always in sync with real time updates? (a push based solution is
better than poll but i am open to whatever is possible given the above picture)

Re: quick question

2016-08-25 Thread Sivakumaran S
Is this a Spark Streaming job?

Regards,

Sivakumaran S


> @Sivakumaran when you say create a web socket object in your spark code I 
> assume you meant a spark "task" opening websocket 
> connection from one of the worker machines to some node.js server in that 
> case the websocket connection terminates after the spark 
> task is completed right ? and when new data comes in a new task gets created 
> and opens a new websocket connection again…is that how it should be

> On 25-Aug-2016, at 7:08 AM, kant kodali <kanth...@gmail.com> wrote:
> 
> @Sivakumaran when you say create a web socket object in your spark code I 
> assume you meant a spark "task" opening websocket connection from one of the 
> worker machines to some node.js server in that case the websocket connection 
> terminates after the spark task is completed right ? and when new data comes 
> in a new task gets created and opens a new websocket connection again…is that 
> how it should be?
> 
> 
> 
> 
> 
> On Wed, Aug 24, 2016 10:38 PM, Sivakumaran S siva.kuma...@me.com 
> <mailto:siva.kuma...@me.com> wrote:
> You create a websocket object in your spark code and write your data to the 
> socket. You create a websocket object in your dashboard code and receive the 
> data in realtime and update the dashboard. You can use Node.js in your 
> dashboard (socket.io <http://socket.io/>). I am sure there are other ways too.
> 
> Does that help?
> 
> Sivakumaran S
> 
>> On 25-Aug-2016, at 6:30 AM, kant kodali <kanth...@gmail.com 
>> <mailto:kanth...@gmail.com>> wrote:
>> 
>> so I would need to open a websocket connection from spark worker machine to 
>> where?
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com 
>> <mailto:kevin.r.mell...@gmail.com> wrote:
>> In the diagram you referenced, a real-time dashboard can be created using 
>> WebSockets. This technology essentially allows your web page to keep an 
>> active line of communication between the client and server, in which case 
>> you can detect and display new information without requiring any user input 
>> of page refreshes. The link below contains additional information on this 
>> concept, as well as links to several different implementations (based on 
>> your programming language preferences).
>> 
>> https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API 
>> <https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API>
>> 
>> Hope this helps!
>> - Kevin
>> 
>> On Wed, Aug 24, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com 
>> <mailto:kanth...@gmail.com>> wrote:
>> 
>> -- Forwarded message --
>> From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
>> Date: Wed, Aug 24, 2016 at 1:49 PM
>> Subject: quick question
>> To: d...@spark.apache.org <mailto:d...@spark.apache.org>, 
>> us...@spark.apache.org <mailto:us...@spark.apache.org>
>> 
>> 
>> 
>> 
>> In this picture what does "Dashboards" really mean? is there a open source 
>> project which can allow me to push the results back to Dashboards such that 
>> Dashboards are always in sync with real time updates? (a push based solution 
>> is better than poll but i am open to whatever is possible given the above 
>> picture)
>> 
> 



Re: quick question

2016-08-25 Thread kant kodali
@Sivakumaran when you say create a web socket object in your spark code I assume
you meant a spark "task" opening websocket connection from one of the worker
machines to some node.js server in that case the websocket connection terminates
after the spark task is completed right ? and when new data comes in a new task
gets created and opens a new websocket connection again…is that how it should
be?





On Wed, Aug 24, 2016 10:38 PM, Sivakumaran S siva.kuma...@me.com wrote:
You create a websocket object in your spark code and write your data to the
socket. You create a websocket object in your dashboard code and receive the
data in realtime and update the dashboard. You can use Node.js in your dashboard
( socket.io ). I am sure there are other ways too.
Does that help?
Sivakumaran S
On 25-Aug-2016, at 6:30 AM, kant kodali < kanth...@gmail.com > wrote:
so I would need to open a websocket connection from spark worker machine to
where?





On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com wrote:
In the diagram you referenced, a real-time dashboard can be created using
WebSockets. This technology essentially allows your web page to keep an active
line of communication between the client and server, in which case you can
detect and display new information without requiring any user input of page
refreshes. The link below contains additional information on this concept, as
well as links to several different implementations (based on your programming
language preferences).
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API

Hope this helps! - Kevin
On Wed, Aug 24, 2016 at 3:52 PM, kant kodali < kanth...@gmail.com > wrote:

-- Forwarded message --
From: kant kodali < kanth...@gmail.com >
Date: Wed, Aug 24, 2016 at 1:49 PM
Subject: quick question
To: d...@spark.apache.org , us...@spark.apache.org



In this picture what does "Dashboards" really mean? is there a open source
project which can allow me to push the results back to Dashboards such that
Dashboards are always in sync with real time updates? (a push based solution is
better than poll but i am open to whatever is possible given the above picture)

Re: quick question

2016-08-24 Thread Sivakumaran S
You create a websocket object in your spark code and write your data to the 
socket. You create a websocket object in your dashboard code and receive the 
data in realtime and update the dashboard. You can use Node.js in your 
dashboard (socket.io). I am sure there are other ways too.

Does that help?

Sivakumaran S

> On 25-Aug-2016, at 6:30 AM, kant kodali <kanth...@gmail.com> wrote:
> 
> so I would need to open a websocket connection from spark worker machine to 
> where?
> 
> 
> 
> 
> 
> On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com 
> <mailto:kevin.r.mell...@gmail.com> wrote:
> In the diagram you referenced, a real-time dashboard can be created using 
> WebSockets. This technology essentially allows your web page to keep an 
> active line of communication between the client and server, in which case you 
> can detect and display new information without requiring any user input of 
> page refreshes. The link below contains additional information on this 
> concept, as well as links to several different implementations (based on your 
> programming language preferences).
> 
> https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API 
> <https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API>
> 
> Hope this helps!
> - Kevin
> 
> On Wed, Aug 24, 2016 at 3:52 PM, kant kodali <kanth...@gmail.com 
> <mailto:kanth...@gmail.com>> wrote:
> 
> -- Forwarded message --
> From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
> Date: Wed, Aug 24, 2016 at 1:49 PM
> Subject: quick question
> To: d...@spark.apache.org <mailto:d...@spark.apache.org>, 
> us...@spark.apache.org <mailto:us...@spark.apache.org>
> 
> 
> 
> 
> In this picture what does "Dashboards" really mean? is there a open source 
> project which can allow me to push the results back to Dashboards such that 
> Dashboards are always in sync with real time updates? (a push based solution 
> is better than poll but i am open to whatever is possible given the above 
> picture)
>