Re: [GraphX]: Prevent recomputation of DAG

2024-03-18 Thread Mich Talebzadeh
Hi, I must admit I don't know much about this Fruchterman-Reingold (call it FR) visualization using GraphX and Kubernetes..But you are suggesting this slowdown issue starts after the second iteration, and caching/persisting the graph after each iteration does not help. FR involves many computation

[GraphX]: Prevent recomputation of DAG

2024-03-17 Thread Marek Berith
Dear community, for my diploma thesis, we are implementing a distributed version of Fruchterman-Reingold visualization algorithm, using GraphX and Kubernetes. Our solution is a backend that continously computes new positions of vertices in a graph and sends them via RabbitMQ to a consumer. Fruc

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Mich Talebzadeh
be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan < >>> tr.phan.tru...@gmail.com> wrote: >>> >>>> Hi, >

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Khalid Mammadov
arning about Apache Spark and want to know the meaning of each >>> Task created on the Jobs recorded on Spark history. >>> >>> For example, the application I write creates 17 jobs, in which job 0 >>> runs for 10 minutes, there are 2384 small tasks and I want

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
. >> >> >> >> >> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan < >> tr.phan.tru...@gmail.com> wrote: >> >>> Hi, >>> >>> I am learning about Apache Spark and want to know the meaning of each >>> Task created on th

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread AN-TRUONG Tran Phan
e creates 17 jobs, in which job 0 runs >> for 10 minutes, there are 2384 small tasks and I want to learn about the >> meaning of these 2384, is it possible? >> >> I found a picture of DAG in the Jobs and want to know the relationship >> between DAG and

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
ite creates 17 jobs, in which job 0 runs > for 10 minutes, there are 2384 small tasks and I want to learn about the > meaning of these 2384, is it possible? > > I found a picture of DAG in the Jobs and want to know the relationship > between DAG and Task, is it possible (Specifically

Mapping stages in DAG to line of code in pyspark

2021-04-18 Thread Dhruv Kumar
Hi I am using PySpark for writing Spark queries. My research project requires me to accurately measure latency for each and every operator/stage in the query. I can make some guesses but unable to exactly map the stages (shown in the DAG on Spark UI) to the exact line in my PySpark code. Can

Re: Where is the DAG stored before catalyst gets it?

2018-10-06 Thread Jacek Laskowski
Hi Jean Georges, > I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers. Sorry to be that direct, but the sentence does not make much sense to me. Again, very sorry for saying it in the very first sentence. Since I know Jean Georges I allowed mys

Where is the DAG stored before catalyst gets it?

2018-10-04 Thread Jean Georges Perrin
Hi, I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers. Correct? tia jg - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark ML DAG Pipelines

2017-09-07 Thread Srikanth Sampath
Hi Spark Experts, Can someone point me to some examples for non-linear (DAG) ML pipelines. That would be of great help. Thanks much in advance -Srikanth

Re: [Spark Streaming] DAG Output Processing mechanism

2017-05-29 Thread Nipun Arora
Sending out the message again.. Hopefully someone cal clarify :) I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the

Re: [Spark Streaming] DAG Output Processing mechanism

2017-05-28 Thread Nipun Arora
Apogies - Resending as the previous mail went with some unnecessary copy paste. I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all

[Spark Streaming] DAG Output Processing mechanism

2017-05-28 Thread Nipun Arora
up vote 0 down vote favorite I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have a

[Spark Streaming] DAG Execution Model Clarification

2017-05-26 Thread Nipun Arora
Hi, I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have a dstream -A , I do map

How to generate stage for this RDD DAG please?

2017-05-23 Thread ??????????
Hi all, I read some paper about the stage, l know the narrow dependency and shuffle dependency. About the belowing RDD DAG, how deos spark generate the stage DAG please? And is this RDD DAG legal please?<> --

Re: DAG Visualization option is missing on Spark Web UI

2017-01-30 Thread Md. Rezaul Karim
ote: > >> Hi Jacek, >> >> I tried accessing Spark web UI on both Firefox and Google Chrome browsers >> with ad blocker enabled. I do see other options like* User, Total >> Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event >> Timeline. However, I don&

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Mark Hamstra
blocker enabled. I do see other options like* User, Total Uptime, > Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. > However, I don't see an option for DAG visualization. > > Please note that I am experiencing the same issue with Spark 2.x (i.e. > 2.0.0, 2.0.1

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi Jacek, I tried accessing Spark web UI on both Firefox and Google Chrome browsers with ad blocker enabled. I do see other options like* User, Total Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. However, I don't see an option for DAG visualization. Please note

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Jacek Laskowski
job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Ana

DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi All, I am running a Spark job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Mich Talebzadeh
right let us simplify this. can you run the whole thing *once* only and send dag execution output from UI? you can use snipping tool to take the image. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Rabin Banerjee
(look at UI page, 4040 by default) , ? *I checked Spark UI DAG , so many file reads , Why ?* 6. What Spark mode is being used (Local, Standalone, Yarn) ? *Yarn* 7. OOM could be anything depending on how much you are allocating to your driver memory in spark-submit ? *Driver and

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-09 Thread Mich Talebzadeh
Driver memory is set as 4gb which is too high as > data size is in MB. > > Questions :: > > 1. Will Spark optimize multiple SQL queries into one single plysical plan ? > 2. In DAG I can see a lot of file read and lot of stages , Why ? I only > called action once ? > 3. Is every

SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-09 Thread Rabin Banerjee
Spark will optimize multiple SQL into one physical execution plan . 2. Executor memory and Driver memory is set as 4gb which is too high as data size is in MB. Questions :: 1. Will Spark optimize multiple SQL queries into one single plysical plan ? 2. In DAG I can see a lot of file read and l

DAG of Spark Sort application spanning two jobs

2016-05-30 Thread alvarobrandon
n3.nabble.com/file/n27047/cbKDZ.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/

DAG Pipelines?

2016-05-04 Thread Cesar Flores
I read on the ml-guide page ( http://spark.apache.org/docs/latest/ml-guide.html#details). It mention that it is possible to construct DAG Pipelines. Unfortunately there is no example to explain under which use case this may be useful. *Can someone give me an example or use case where this

the "DAG Visualiztion" in 1.6 not works fine here

2016-03-15 Thread charles li
sometimes it just shows several *black dots*, and sometimes it can not show the entire graph. did anyone meet this before and how did you fix it? ​ ​ -- *--* a spark lover, a quant, a developer and a good man. http://github.com/litaotao

streaming application redundant dag stage execution/performance/caching

2016-02-16 Thread krishna ramachandran
We have a streaming application containing approximately 12 stages every batch, running in streaming mode (4 sec batches). Each stage persists output to cassandra the pipeline stages stage 1 ---> receive Stream A --> map --> filter -> (union with another stream B) --> map --> groupbykey --> trans

RE: Question on Spark architecture and DAG

2016-02-12 Thread Mich Talebzadeh
ndy Davidson [mailto:a...@santacruzintegration.com] Sent: 12 February 2016 21:17 To: Mich Talebzadeh ; user@spark.apache.org Subject: Re: Question on Spark architecture and DAG From: Mich Talebzadeh mailto:m...@peridale.co.uk> > Date: Thursday, February 11, 2016 at 2:30 PM To: "u

Re: Question on Spark architecture and DAG

2016-02-12 Thread Andy Davidson
From: Mich Talebzadeh Date: Thursday, February 11, 2016 at 2:30 PM To: "user @spark" Subject: Question on Spark architecture and DAG > Hi, > > I have used Hive on Spark engine and of course Hive tables and its pretty > impressive comparing Hive using MR engine. >

Question on Spark architecture and DAG

2016-02-11 Thread Mich Talebzadeh
Hi, I have used Hive on Spark engine and of course Hive tables and its pretty impressive comparing Hive using MR engine. Let us assume that I use spark shell. Spark shell is a client that connects to spark master running on a host and port like below spark-shell --master spark://50.140.197.21

Is there some open source tools which implements draggable widget and make the app runing in a form of DAG ?

2016-02-01 Thread zml张明磊
Hello , I am trying to find some tools but useless. So, as title described, Is there some open source tools which implements draggable widget and make the app running in a form of DAG like workflow ? Thanks, Minglei.

DAG visualization: no visualization information available with history server

2016-01-31 Thread Raghava
spark.ui.retainedJobs=1 --conf spark.ui.retainedStages=1 In the Spark Web UI (http://localhost:18080/), the DAG visualization of only the most recent job is available. For rest of the jobs, I get the following message No visualization information available for this job! If this is an old job, its

Re: Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-25 Thread Tathagata Das
> I am facing a challenge in Production with DAG behaviour during > checkpointing in spark streaming - > > Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ > 100 GB of data > > Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to > para

Re: Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-25 Thread Cody Koeninger
from kafka; replay the prior window data from checkpoint / other storage (not much reason for this, since it's stored in kafka); or lose the prior window data. On Sat, Jan 23, 2016 at 3:47 PM, gaurav sharma wrote: > Hi Tathagata/Cody, > > I am facing a challenge in Production with

Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-23 Thread gaurav sharma
Hi Tathagata/Cody, I am facing a challenge in Production with DAG behaviour during checkpointing in spark streaming - Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ 100 GB of data Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise processing

Re: Dynamic DAG use-case for spark streaming.

2015-09-29 Thread Tathagata Das
A very basic support that is there in DStream is DStream.transform() which take arbitrary RDD => RDD function. This function can actually choose to do different computation with time. That may be of help to you. On Tue, Sep 29, 2015 at 12:06 PM, Archit Thakur wrote: > Hi, > > We are using spark

Dynamic DAG use-case for spark streaming.

2015-09-29 Thread Archit Thakur
Hi, We are using spark streaming as our processing engine, and as part of output we want to push the data to UI. Now there would be multiple users accessing the system with there different filters on. Based on the filters and other inputs we want to either run a SQL Query on DStream or do a custo

Re: DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread Petros Nyfantis
element keyA is passed in the aggregate function as a initialization parameter and then for each B element key keyB, if M(keyA, keyB) ==1 then the B element is being taken into account in the summation. The calculation of A is done successfully and correctly, but then the DAG scheduler seems to deadl

Re: DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread Sean Owen
lization parameter and then for each B element key keyB, if M(keyA, > keyB) ==1 > then the B element is being taken into account in the summation. > > The calculation of A is done successfully and correctly, but then the DAG > scheduler > seems to deadlock when the calculation of B happ

DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread petranidis
is passed in the aggregate function as a initialization parameter and then for each B element key keyB, if M(keyA, keyB) ==1 then the B element is being taken into account in the summation. The calculation of A is done successfully and correctly, but then the DAG scheduler seems to deadlock

How to create combine DAG visualization?

2015-09-10 Thread b.bhavesh
Hi, How can I create combine DAG visualization of pyspark code instead of separate DAGs of jobs and stages? Thanks b.bhavesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-combine-DAG-visualization-tp24653.html Sent from the Apache Spark

Re: DAG related query

2015-08-20 Thread Andrew Or
l driver program. The first > RDD still exists. > > On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain wrote: > > Hi, > > How would the DAG look like for the below code > > > > JavaRDD rdd1 = context.textFile(); > > JavaRDD rdd2 = rdd1.map(); > > rdd1 = rdd2.

Re: DAG related query

2015-08-20 Thread Sean Owen
No. The third line creates a third RDD whose reference simply replaces the reference to the first RDD in your local driver program. The first RDD still exists. On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain wrote: > Hi, > How would the DAG look like for the below code > >

DAG related query

2015-08-20 Thread Bahubali Jain
Hi, How would the DAG look like for the below code JavaRDD rdd1 = context.textFile(); JavaRDD rdd2 = rdd1.map(); rdd1 = rdd2.map(); Does this lead to any kind of cycle? Thanks, Baahu

Re: DataFrame DAG recomputed even though DataFrame is cached?

2015-07-28 Thread Michael Armbrust
We will try to address this before Spark 1.5 is released: https://issues.apache.org/jira/browse/SPARK-9141 On Tue, Jul 28, 2015 at 11:50 AM, Kristina Rogale Plazonic wrote: > Hi, > > I'm puzzling over the following problem: when I cache a small sample of a > big dataframe, the small dataframe is

DataFrame DAG recomputed even though DataFrame is cached?

2015-07-28 Thread Kristina Rogale Plazonic
Hi, I'm puzzling over the following problem: when I cache a small sample of a big dataframe, the small dataframe is recomputed when selecting a column (but not if show() or count() is invoked). Why is that so and how can I avoid recomputation of the small sample dataframe? More details: - I hav

Building DAG from log

2015-05-04 Thread Giovanni Paolo Gibilisco
Hi, I'm trying to build the DAG of an application from the logs. I've had a look at SparkReplayDebugger but it doesn't operato offline on logs. I looked also at the one in this pull: https://github.com/apache/spark/pull/2077 that seems to operate only on logs but it doesn

Re: DAG

2015-04-25 Thread Corey Nolet
Giovanni, The DAG can be walked by calling the "dependencies()" function on any RDD. It returns a Seq containing the parent RDDs. If you start at the leaves and walk through the parents until dependencies() returns an empty Seq, you ultimately have your DAG. On Sat, Apr 25, 2015

Re: DAG

2015-04-25 Thread Akhil Das
May be this will give you a good start https://github.com/apache/spark/pull/2077 Thanks Best Regards On Sat, Apr 25, 2015 at 1:29 AM, Giovanni Paolo Gibilisco wrote: > Hi, > I would like to know if it is possible to build the DAG before actually > executing the application. My guess i

DAG

2015-04-24 Thread Giovanni Paolo Gibilisco
Hi, I would like to know if it is possible to build the DAG before actually executing the application. My guess is that in the scheduler the DAG is built dynamically at runtime since it might depend on the data, but I was wondering if there is a way (and maybe a tool already) to analyze the code

Re: Spark Application Stages and DAG

2015-04-07 Thread Vijay Innamuri
t;>> When I run the Spark application (streaming) in local mode I could see >>> the execution progress as below.. >>> >>> [Stage >>> 0:> >>> (1817 + 1) / 3125] >>> >>> [Stage >>> 2:===> >>> (740 + 1) / 3125] >>> >>> One of the stages is taking long time for execution. >>> >>> How to find the transformations/ actions associated with a particular >>> stage? >>> Is there anyway to find the execution DAG of a Spark Application? >>> >>> Regards >>> Vijay >>> >> >> >

Re: Spark Application Stages and DAG

2015-04-03 Thread Tathagata Das
> >> (740 + 1) / 3125] >> >> One of the stages is taking long time for execution. >> >> How to find the transformations/ actions associated with a particular >> stage? >> Is there anyway to find the execution DAG of a Spark Application? >> >> Regards >> Vijay >> > >

Re: Spark Application Stages and DAG

2015-04-03 Thread Akhil Das
===> > (740 + 1) / 3125] > > One of the stages is taking long time for execution. > > How to find the transformations/ actions associated with a particular > stage? > Is there anyway to find the execution DAG of a Spark Application? > > Regards > Vijay >

Spark Application Stages and DAG

2015-04-03 Thread Vijay Innamuri
of the stages is taking long time for execution. How to find the transformations/ actions associated with a particular stage? Is there anyway to find the execution DAG of a Spark Application? Regards Vijay

Support for Data flow graphs and not DAG only

2015-04-02 Thread anshu shukla
Hey , I didn't find any documentation regarding support for cycles in spark topology , although storm supports this using manual configuration in acker function logic (setting it to a particular count) .By cycles i doesn't mean infinite loops . -- Thanks & Regards, Anshu Shukla

question regarding the dependency DAG in Spark

2015-03-16 Thread Grandl Robert
Hi guys, I am trying to get a better understanding of the DAG generation for a job in Spark. Ideally, what I want is to run some SQL query and extract the generated DAG by Spark. By DAG I mean the stages and dependencies among stages, and the number of tasks in every stage. Could you guys

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny
For anybody who's interested in this, here's a link to a PR that addresses this feature : https://github.com/apache/spark/pull/2077 (thanks to Todd Nist for sending it to me) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-the-DAG-

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread Todd Nist
There is the PR https://github.com/apache/spark/pull/2077 for doing this. On Fri, Mar 13, 2015 at 6:42 AM, t1ny wrote: > Hi all, > > We are looking for a tool that would let us visualize the DAG generated by > a > Spark application as a simple graph. > This graph would repre

Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny
Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent the Spark Job, its stages and the tasks inside the stages, with the dependencies between them (either narrow or shuffle dependencies). The Spark

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Tobias Pfeiffer
Hi, On Sat, Jan 17, 2015 at 3:37 AM, Peng Cheng wrote: > I'm talking about RDD1 (not persisted or checkpointed) in this situation: > > ...(somewhere) -> RDD1 -> RDD2 > || > V V >

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Xuefeng Wu
problem might be in $SQLContest.jsonRDD(), since the source > jsonRDD is used twice (one for schema inferring, another for data read). It > almost guarantees that the source jsonRDD is calculated twice. Has this > problem be addressed so far? > > > > -- > View this message i

If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-16 Thread Peng Cheng
I'm talking about RDD1 (not persisted or checkpointed) in this situation: ...(somewhere) -> RDD1 -> RDD2 || V V RDD3 -> RDD4 -> Action! To my experience the change RDD1 get recalc

If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-16 Thread Peng Cheng
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which-calculation-is-triggered-by-a-single-action-will-this-RDD-tp21192.html Sent from the Apache Spark User List mailing

Re: Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet
I asked this question too soon. I am caching off a bunch of RDDs in a TrieMap so that our framework can wire them together and the locking was not completely correct- therefore it was creating multiple new RDDs at times instead of using cached versions- which were creating completely separate linea

Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet
We just updated to Spark 1.2.0 from Spark 1.1.0. We have a small framework that we've been developing that connects various different RDDs together based on some predefined business cases. After updating to 1.2.0, some of the concurrency expectations about how the stages within jobs are executed ha

Re: DAG info

2015-01-03 Thread madhu phatak
Hi, You can turn off these messages using log4j.properties. On Fri, Jan 2, 2015 at 1:51 PM, Robineast wrote: > Do you have some example code of what you are trying to do? > > Robin > > > > -- > View this message in context: > http://apache-spark-user-list.100

Re: DAG info

2015-01-02 Thread Robineast
Do you have some example code of what you are trying to do? Robin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940p20941.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: DAG info

2015-01-01 Thread Josh Rosen
thonRDD.scala:43), which has no missing parents > > Also my program is taking lot of time to execute. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html > Sent from the Apache

DAG info

2015-01-01 Thread shahid
:43), which has no missing parents Also my program is taking lot of time to execute. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html Sent from the Apache Spark User List mailing list archive at Nabble.com

DAG info

2015-01-01 Thread shahid ashraf
hi guys i have just starting using spark, i am getting this as an info 15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List() 15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List() 15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at RDD at PythonRDD.scala: