Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Sean Owen
Scala and Python are not the same in this regard. This isn't related to how spark works. On Sun, Feb 6, 2022, 10:04 PM wrote: > Indeed. in spark-shell I ignore the parentheses always, > > scala> sc.parallelize(List(3,2,1,4)).toDF.show > +-+ > |value| > +-+ > |3| > |2| > |1|

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread capitnfrakass
Indeed. in spark-shell I ignore the parentheses always, scala> sc.parallelize(List(3,2,1,4)).toDF.show +-+ |value| +-+ |3| |2| |1| |4| +-+ So I think it would be ok in pyspark. But this still doesn't work. why? sc.parallelize([3,2,1,4]).toDF().show() Traceback

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Sean Owen
This is just basic Python - you're missing parentheses on toDF, so you are not calling a function nor getting its result. On Sun, Feb 6, 2022 at 9:39 PM wrote: > I am a bit confused why in pyspark this doesn't work? > > >>> x = sc.parallelize([3,2,1,4]) > >>> x.toDF.show() > Traceback (most

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread capitnfrakass
I am a bit confused why in pyspark this doesn't work? x = sc.parallelize([3,2,1,4]) x.toDF.show() Traceback (most recent call last): File "", line 1, in AttributeError: 'function' object has no attribute 'show' Thank you.

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Mich Talebzadeh
Basically you are creating a dataframe (a dataframe is a *Dataset* organized into named columns. It is conceptually equivalent to a table in a relational database) out of RDD here. scala> val rdd = sc.parallelize( List(3, 2, 1, 4, 0)) rdd: org.apache.spark.rdd.RDD[Int] =

Re: dataframe doesn't support higher order func, right?

2022-02-06 Thread Sean Owen
DataFrames are a quite different API, more SQL-like in its operations, not functional. The equivalent would be more like df.filterExpr("value > 2") On Sun, Feb 6, 2022 at 5:51 AM wrote: > for example, this work for RDD object: > > scala> val li = List(3,2,1,4,0) > li: List[Int] = List(3, 2, 1,

dataframe doesn't support higher order func, right?

2022-02-06 Thread capitnfrakass
for example, this work for RDD object: scala> val li = List(3,2,1,4,0) li: List[Int] = List(3, 2, 1, 4, 0) scala> val rdd = sc.parallelize(li) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :24 scala> rdd.filter(_ > 2).collect() res0: Array[Int] = Array(3, 4)