Hi,
I must admit I don't know much about this Fruchterman-Reingold (call
it FR) visualization using GraphX and Kubernetes..But you are
suggesting this slowdown issue starts after the second iteration, and
caching/persisting the graph after each iteration does not help. FR
involves many computation
Dear community,
for my diploma thesis, we are implementing a distributed version of
Fruchterman-Reingold visualization algorithm, using GraphX and Kubernetes. Our
solution is a backend that continously computes new positions of vertices in a
graph and sends them via RabbitMQ to a consumer. Fruc
be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan <
>>> tr.phan.tru...@gmail.com> wrote:
>>>
>>>> Hi,
>
arning about Apache Spark and want to know the meaning of each
>>> Task created on the Jobs recorded on Spark history.
>>>
>>> For example, the application I write creates 17 jobs, in which job 0
>>> runs for 10 minutes, there are 2384 small tasks and I want
.
>>
>>
>>
>>
>> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan <
>> tr.phan.tru...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am learning about Apache Spark and want to know the meaning of each
>>> Task created on th
e creates 17 jobs, in which job 0 runs
>> for 10 minutes, there are 2384 small tasks and I want to learn about the
>> meaning of these 2384, is it possible?
>>
>> I found a picture of DAG in the Jobs and want to know the relationship
>> between DAG and
ite creates 17 jobs, in which job 0 runs
> for 10 minutes, there are 2384 small tasks and I want to learn about the
> meaning of these 2384, is it possible?
>
> I found a picture of DAG in the Jobs and want to know the relationship
> between DAG and Task, is it possible (Specifically
Hi
I am using PySpark for writing Spark queries. My research project requires me
to accurately measure latency for each and every operator/stage in the query. I
can make some guesses but unable to exactly map the stages (shown in the DAG on
Spark UI) to the exact line in my PySpark code.
Can
Hi Jean Georges,
> I am assuming it is still in the master and when catalyst is finished it
sends the tasks to the workers.
Sorry to be that direct, but the sentence does not make much sense to me.
Again, very sorry for saying it in the very first sentence. Since I know
Jean Georges I allowed mys
Hi,
I am assuming it is still in the master and when catalyst is finished it sends
the tasks to the workers.
Correct?
tia
jg
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi Spark Experts,
Can someone point me to some examples for non-linear (DAG) ML pipelines.
That would be of great help.
Thanks much in advance
-Srikanth
Sending out the message again.. Hopefully someone cal clarify :)
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the
Apogies - Resending as the previous mail went with some unnecessary copy
paste.
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all
up vote
0
down vote
favorite
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.
Let me give an example:
I have a
Hi,
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.
Let me give an example:
I have a dstream -A , I do map
Hi all,
I read some paper about the stage, l know the narrow dependency and shuffle
dependency.
About the belowing RDD DAG, how deos spark generate the stage DAG please?
And is this RDD DAG legal please?<>
--
ote:
>
>> Hi Jacek,
>>
>> I tried accessing Spark web UI on both Firefox and Google Chrome browsers
>> with ad blocker enabled. I do see other options like* User, Total
>> Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event
>> Timeline. However, I don&
blocker enabled. I do see other options like* User, Total Uptime,
> Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline.
> However, I don't see an option for DAG visualization.
>
> Please note that I am experiencing the same issue with Spark 2.x (i.e.
> 2.0.0, 2.0.1
Hi Jacek,
I tried accessing Spark web UI on both Firefox and Google Chrome browsers
with ad blocker enabled. I do see other options like* User, Total Uptime,
Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline.
However, I don't see an option for DAG visualization.
Please note
job on my local machine written in Scala with Spark
2.1.0. However, I am not seeing any option of "*DAG Visualization*" at
http://localhost:4040/jobs/
Suggestion, please.
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Ana
Hi All,
I am running a Spark job on my local machine written in Scala with Spark
2.1.0. However, I am not seeing any option of "*DAG Visualization*" at
http://localhost:4040/jobs/
Suggestion, please.
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD
right let us simplify this.
can you run the whole thing *once* only and send dag execution output from
UI?
you can use snipping tool to take the image.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
(look
at UI page, 4040 by default) , ? *I checked Spark UI DAG , so many file
reads , Why ?*
6. What Spark mode is being used (Local, Standalone, Yarn) ? *Yarn*
7. OOM could be anything depending on how much you are allocating to
your driver memory in spark-submit ? *Driver and
Driver memory is set as 4gb which is too high as
> data size is in MB.
>
> Questions ::
>
> 1. Will Spark optimize multiple SQL queries into one single plysical plan ?
> 2. In DAG I can see a lot of file read and lot of stages , Why ? I only
> called action once ?
> 3. Is every
Spark will optimize
multiple SQL into one physical execution plan .
2. Executor memory and Driver memory is set as 4gb which is too high as
data size is in MB.
Questions ::
1. Will Spark optimize multiple SQL queries into one single plysical plan ?
2. In DAG I can see a lot of file read and l
n3.nabble.com/file/n27047/cbKDZ.png>
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png>
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png>
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/
I read on the ml-guide page (
http://spark.apache.org/docs/latest/ml-guide.html#details). It mention that
it is possible to construct DAG Pipelines. Unfortunately there is no
example to explain under which use case this may be useful.
*Can someone give me an example or use case where this
sometimes it just shows several *black dots*, and sometimes it can not show
the entire graph.
did anyone meet this before and how did you fix it?
--
*--*
a spark lover, a quant, a developer and a good man.
http://github.com/litaotao
We have a streaming application containing approximately 12 stages every
batch, running in streaming mode (4 sec batches). Each stage persists
output to cassandra
the pipeline stages
stage 1
---> receive Stream A --> map --> filter -> (union with another stream B)
--> map --> groupbykey --> trans
ndy Davidson [mailto:a...@santacruzintegration.com]
Sent: 12 February 2016 21:17
To: Mich Talebzadeh ; user@spark.apache.org
Subject: Re: Question on Spark architecture and DAG
From: Mich Talebzadeh mailto:m...@peridale.co.uk> >
Date: Thursday, February 11, 2016 at 2:30 PM
To: "u
From: Mich Talebzadeh
Date: Thursday, February 11, 2016 at 2:30 PM
To: "user @spark"
Subject: Question on Spark architecture and DAG
> Hi,
>
> I have used Hive on Spark engine and of course Hive tables and its pretty
> impressive comparing Hive using MR engine.
>
Hi,
I have used Hive on Spark engine and of course Hive tables and its pretty
impressive comparing Hive using MR engine.
Let us assume that I use spark shell. Spark shell is a client that connects
to spark master running on a host and port like below
spark-shell --master spark://50.140.197.21
Hello ,
I am trying to find some tools but useless. So, as title described, Is
there some open source tools which implements draggable widget and make the app
running in a form of DAG like workflow ?
Thanks,
Minglei.
spark.ui.retainedJobs=1 --conf
spark.ui.retainedStages=1
In the Spark Web UI (http://localhost:18080/), the DAG visualization of only
the most recent job is available. For rest of the jobs, I get the following
message
No visualization information available for this job!
If this is an old job, its
> I am facing a challenge in Production with DAG behaviour during
> checkpointing in spark streaming -
>
> Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~
> 100 GB of data
>
> Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to
> para
from kafka; replay the prior window data from checkpoint
/ other storage (not much reason for this, since it's stored in kafka); or
lose the prior window data.
On Sat, Jan 23, 2016 at 3:47 PM, gaurav sharma
wrote:
> Hi Tathagata/Cody,
>
> I am facing a challenge in Production with
Hi Tathagata/Cody,
I am facing a challenge in Production with DAG behaviour during
checkpointing in spark streaming -
Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~
100 GB of data
Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise
processing
A very basic support that is there in DStream is DStream.transform() which
take arbitrary RDD => RDD function. This function can actually choose to do
different computation with time. That may be of help to you.
On Tue, Sep 29, 2015 at 12:06 PM, Archit Thakur
wrote:
> Hi,
>
> We are using spark
Hi,
We are using spark streaming as our processing engine, and as part of
output we want to push the data to UI. Now there would be multiple users
accessing the system with there different filters on. Based on the filters
and other inputs we want to either run a SQL Query on DStream or do a
custo
element keyA is passed in the aggregate
function as a
initialization parameter and then for each B element key keyB, if M(keyA,
keyB) ==1
then the B element is being taken into account in the summation.
The calculation of A is done successfully and correctly, but then the DAG
scheduler
seems to deadl
lization parameter and then for each B element key keyB, if M(keyA,
> keyB) ==1
> then the B element is being taken into account in the summation.
>
> The calculation of A is done successfully and correctly, but then the DAG
> scheduler
> seems to deadlock when the calculation of B happ
is passed in the aggregate
function as a
initialization parameter and then for each B element key keyB, if M(keyA,
keyB) ==1
then the B element is being taken into account in the summation.
The calculation of A is done successfully and correctly, but then the DAG
scheduler
seems to deadlock
Hi,
How can I create combine DAG visualization of pyspark code instead of
separate DAGs of jobs and stages?
Thanks
b.bhavesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-combine-DAG-visualization-tp24653.html
Sent from the Apache Spark
l driver program. The first
> RDD still exists.
>
> On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain wrote:
> > Hi,
> > How would the DAG look like for the below code
> >
> > JavaRDD rdd1 = context.textFile();
> > JavaRDD rdd2 = rdd1.map();
> > rdd1 = rdd2.
No. The third line creates a third RDD whose reference simply replaces
the reference to the first RDD in your local driver program. The first
RDD still exists.
On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain wrote:
> Hi,
> How would the DAG look like for the below code
>
>
Hi,
How would the DAG look like for the below code
JavaRDD rdd1 = context.textFile();
JavaRDD rdd2 = rdd1.map();
rdd1 = rdd2.map();
Does this lead to any kind of cycle?
Thanks,
Baahu
We will try to address this before Spark 1.5 is released:
https://issues.apache.org/jira/browse/SPARK-9141
On Tue, Jul 28, 2015 at 11:50 AM, Kristina Rogale Plazonic wrote:
> Hi,
>
> I'm puzzling over the following problem: when I cache a small sample of a
> big dataframe, the small dataframe is
Hi,
I'm puzzling over the following problem: when I cache a small sample of a
big dataframe, the small dataframe is recomputed when selecting a column
(but not if show() or count() is invoked).
Why is that so and how can I avoid recomputation of the small sample
dataframe?
More details:
- I hav
Hi,
I'm trying to build the DAG of an application from the logs.
I've had a look at SparkReplayDebugger but it doesn't operato offline on
logs. I looked also at the one in this pull:
https://github.com/apache/spark/pull/2077 that seems to operate only on
logs but it doesn
Giovanni,
The DAG can be walked by calling the "dependencies()" function on any RDD.
It returns a Seq containing the parent RDDs. If you start at the leaves
and walk through the parents until dependencies() returns an empty Seq, you
ultimately have your DAG.
On Sat, Apr 25, 2015
May be this will give you a good start
https://github.com/apache/spark/pull/2077
Thanks
Best Regards
On Sat, Apr 25, 2015 at 1:29 AM, Giovanni Paolo Gibilisco wrote:
> Hi,
> I would like to know if it is possible to build the DAG before actually
> executing the application. My guess i
Hi,
I would like to know if it is possible to build the DAG before actually
executing the application. My guess is that in the scheduler the DAG is
built dynamically at runtime since it might depend on the data, but I was
wondering if there is a way (and maybe a tool already) to analyze the code
t;>> When I run the Spark application (streaming) in local mode I could see
>>> the execution progress as below..
>>>
>>> [Stage
>>> 0:>
>>> (1817 + 1) / 3125]
>>>
>>> [Stage
>>> 2:===>
>>> (740 + 1) / 3125]
>>>
>>> One of the stages is taking long time for execution.
>>>
>>> How to find the transformations/ actions associated with a particular
>>> stage?
>>> Is there anyway to find the execution DAG of a Spark Application?
>>>
>>> Regards
>>> Vijay
>>>
>>
>>
>
>
>> (740 + 1) / 3125]
>>
>> One of the stages is taking long time for execution.
>>
>> How to find the transformations/ actions associated with a particular
>> stage?
>> Is there anyway to find the execution DAG of a Spark Application?
>>
>> Regards
>> Vijay
>>
>
>
===>
> (740 + 1) / 3125]
>
> One of the stages is taking long time for execution.
>
> How to find the transformations/ actions associated with a particular
> stage?
> Is there anyway to find the execution DAG of a Spark Application?
>
> Regards
> Vijay
>
of the stages is taking long time for execution.
How to find the transformations/ actions associated with a particular stage?
Is there anyway to find the execution DAG of a Spark Application?
Regards
Vijay
Hey ,
I didn't find any documentation regarding support for cycles in spark
topology , although storm supports this using manual configuration in
acker function logic (setting it to a particular count) .By cycles i
doesn't mean infinite loops .
--
Thanks & Regards,
Anshu Shukla
Hi guys,
I am trying to get a better understanding of the DAG generation for a job in
Spark.
Ideally, what I want is to run some SQL query and extract the generated DAG by
Spark. By DAG I mean the stages and dependencies among stages, and the number
of tasks in every stage.
Could you guys
For anybody who's interested in this, here's a link to a PR that addresses
this feature :
https://github.com/apache/spark/pull/2077
(thanks to Todd Nist for sending it to me)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-the-DAG-
There is the PR https://github.com/apache/spark/pull/2077 for doing this.
On Fri, Mar 13, 2015 at 6:42 AM, t1ny wrote:
> Hi all,
>
> We are looking for a tool that would let us visualize the DAG generated by
> a
> Spark application as a simple graph.
> This graph would repre
Hi all,
We are looking for a tool that would let us visualize the DAG generated by a
Spark application as a simple graph.
This graph would represent the Spark Job, its stages and the tasks inside
the stages, with the dependencies between them (either narrow or shuffle
dependencies).
The Spark
Hi,
On Sat, Jan 17, 2015 at 3:37 AM, Peng Cheng wrote:
> I'm talking about RDD1 (not persisted or checkpointed) in this situation:
>
> ...(somewhere) -> RDD1 -> RDD2
> ||
> V V
>
problem might be in $SQLContest.jsonRDD(), since the source
> jsonRDD is used twice (one for schema inferring, another for data read). It
> almost guarantees that the source jsonRDD is calculated twice. Has this
> problem be addressed so far?
>
>
>
> --
> View this message i
I'm talking about RDD1 (not persisted or checkpointed) in this situation:
...(somewhere) -> RDD1 -> RDD2
||
V V
RDD3 -> RDD4 -> Action!
To my experience the change RDD1 get recalc
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which-calculation-is-triggered-by-a-single-action-will-this-RDD-tp21192.html
Sent from the Apache Spark User List mailing
I asked this question too soon. I am caching off a bunch of RDDs in a
TrieMap so that our framework can wire them together and the locking was
not completely correct- therefore it was creating multiple new RDDs at
times instead of using cached versions- which were creating completely
separate linea
We just updated to Spark 1.2.0 from Spark 1.1.0. We have a small framework
that we've been developing that connects various different RDDs together
based on some predefined business cases. After updating to 1.2.0, some of
the concurrency expectations about how the stages within jobs are executed
ha
Hi,
You can turn off these messages using log4j.properties.
On Fri, Jan 2, 2015 at 1:51 PM, Robineast wrote:
> Do you have some example code of what you are trying to do?
>
> Robin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.100
Do you have some example code of what you are trying to do?
Robin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940p20941.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
thonRDD.scala:43), which has no missing parents
>
> Also my program is taking lot of time to execute.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html
> Sent from the Apache
:43), which has no missing parents
Also my program is taking lot of time to execute.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
hi guys
i have just starting using spark, i am getting this as an info
15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List()
15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List()
15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at
RDD at PythonRDD.scala:
72 matches
Mail list logo