Scala and Python are not the same in this regard. This isn't related to how
spark works.
On Sun, Feb 6, 2022, 10:04 PM wrote:
> Indeed. in spark-shell I ignore the parentheses always,
>
> scala> sc.parallelize(List(3,2,1,4)).toDF.show
> +-+
> |value|
> +-+
> |3|
> |2|
> |1|
Indeed. in spark-shell I ignore the parentheses always,
scala> sc.parallelize(List(3,2,1,4)).toDF.show
+-+
|value|
+-+
|3|
|2|
|1|
|4|
+-+
So I think it would be ok in pyspark.
But this still doesn't work. why?
sc.parallelize([3,2,1,4]).toDF().show()
Traceback
This is just basic Python - you're missing parentheses on toDF, so you are
not calling a function nor getting its result.
On Sun, Feb 6, 2022 at 9:39 PM wrote:
> I am a bit confused why in pyspark this doesn't work?
>
> >>> x = sc.parallelize([3,2,1,4])
> >>> x.toDF.show()
> Traceback (most
I am a bit confused why in pyspark this doesn't work?
x = sc.parallelize([3,2,1,4])
x.toDF.show()
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'function' object has no attribute 'show'
Thank you.
Basically you are creating a dataframe (a dataframe is a *Dataset* organized
into named columns. It is conceptually equivalent to a table in a
relational database) out of RDD here.
scala> val rdd = sc.parallelize( List(3, 2, 1, 4, 0))
rdd: org.apache.spark.rdd.RDD[Int] =
DataFrames are a quite different API, more SQL-like in its operations, not
functional. The equivalent would be more like df.filterExpr("value > 2")
On Sun, Feb 6, 2022 at 5:51 AM wrote:
> for example, this work for RDD object:
>
> scala> val li = List(3,2,1,4,0)
> li: List[Int] = List(3, 2, 1,
for example, this work for RDD object:
scala> val li = List(3,2,1,4,0)
li: List[Int] = List(3, 2, 1, 4, 0)
scala> val rdd = sc.parallelize(li)
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
parallelize at :24
scala> rdd.filter(_ > 2).collect()
res0: Array[Int] = Array(3, 4)