.org" <user@spark.apache.org>
Subject: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for
identical code
This video https://www.youtube.com/watch?v=LQHMMCf2ZWY I think.
On Wed, May 10, 2017 at 8:04 PM,
lucas.g...@gmail.com<mailto:lucas.g...@gmail.com>
<luca
;>> generated something strange which is hard to follow:
>>>>>
>>>>> (2) PythonRDD[13] at RDD at PythonRDD.scala:48 []
>>>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 []
>>>>> | ShuffledRDD[11] at part
) PythonRDD[13] at RDD at PythonRDD.scala:48 []
>>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 []
>>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0
>>>> []
>>>> +-(2) PairwiseRDD[10] at reduceByKey at :1
>&g
| ../log.txt MapPartitionsRDD[8] at textFile at
>>> NativeMethodAccessorImpl.java:0 []
>>> | ../log.txt HadoopRDD[7] at textFile at
>>> NativeMethodAccessorImpl.java:0 []
>>>
>>> Why is that? Does pyspark do some optimizations under th
tFile at
>> NativeMethodAccessorImpl.java:0 []
>> | ../log.txt HadoopRDD[7] at textFile at
>> NativeMethodAccessorImpl.java:0 []
>>
>> Why is that? Does pyspark do some optimizations under the hood? This debug
doopRDD[7] at textFile at
> NativeMethodAccessorImpl.java:0 []
>
> Why is that? Does pyspark do some optimizations under the hood? This debug
> string is really useless for debugging.
>
>
>
> --
> View this message in context:
> http://apache-spark
rk-user-list.1001560.n3.nabble.com/Spark-Core-Python-and-Scala-generate-different-DAGs-for-identical-code-tp28674.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
This Scala code:
scala> val logs = sc.textFile("big_data_specialization/log.txt").
| filter(x => !x.contains("INFO")).
| map(x => (x.split("\t")(1), 1)).
| reduceByKey((x, y) => x + y)
generated obvious lineage:
(2) ShuffledRDD[4] at reduceByKey at :27 []
+-(2)