You can even use the fact that pyspark has dynamic properties rows = idDF2.select(max("col[id]").alias("max")).collect() firstRow = rows[0] max = firstRow.max
On Tue, Mar 29, 2016 at 7:14 PM, Alexander Krasnukhin <the.malk...@gmail.com > wrote: > You should be able to index columns directly either by index or column > name i.e. > > from pyspark.sql.functions import max > > rows = idDF2.select(max("col[id]")).collect() > firstRow = rows[0] > > # by index > max = firstRow[0] > > # by column name > max = firstRow["max(col[id])"] > > On Tue, Mar 29, 2016 at 6:58 PM, Andy Davidson < > a...@santacruzintegration.com> wrote: > >> Hi Alexander >> >> Many thanks. I think the key was I needed to import that max function. >> Turns out you do not need to use col >> Df.select(max(“foo”)).show() >> >> To get the actual value of max you still need to write more code than I >> would expect. I wonder if there is a easier way to work with Rows? >> >> In [19]: >> >> from pyspark.sql.functions import max >> >> maxRow = idDF2.select(max("col[id]")).collect() >> >> max = maxRow[0].asDict()['max(col[id])'] >> >> max >> >> Out[19]: >> >> 713912692155621376 >> >> >> From: Alexander Krasnukhin <the.malk...@gmail.com> >> Date: Monday, March 28, 2016 at 5:55 PM >> To: Andrew Davidson <a...@santacruzintegration.com> >> Cc: "user @spark" <user@spark.apache.org> >> Subject: Re: looking for an easy to to find the max value of a column in >> a data frame >> >> e.g. select max value for column "foo": >> >> from pyspark.sql.functions import max, col >> df.select(max(col("foo"))).show() >> >> On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson < >> a...@santacruzintegration.com> wrote: >> >>> I am using pyspark 1.6.1 and python3. >>> >>> >>> *Given:* >>> >>> idDF2 = idDF.select(idDF.id, idDF.col.id ) >>> idDF2.printSchema() >>> idDF2.show() >>> >>> root >>> |-- id: string (nullable = true) >>> |-- col[id]: long (nullable = true) >>> >>> +----------+----------+ >>> | id| col[id]| >>> +----------+----------+ >>> |1008930924| 534494917| >>> |1008930924| 442237496| >>> |1008930924| 98069752| >>> |1008930924|2790311425| >>> |1008930924|3300869821| >>> >>> >>> >>> *I have to do a lot of work to get the max value* >>> >>> >>> rows = idDF2.select("col[id]").describe().collect() >>> hack = [s for s in rows if s.summary == 'max'] >>> print(hack) >>> print(hack[0].summary) >>> print(type(hack[0])) >>> print(hack[0].asDict()['col[id]']) >>> maxStr = hack[0].asDict()['col[id]'] >>> ttt = int(maxStr) >>> numDimensions = 1 + ttt >>> print(numDimensions) >>> >>> >>> Is there an easier way? >>> >>> >>> Kind regards >>> >>> >>> Andy >>> >>> >> >> >> -- >> Regards, >> Alexander >> >> > > > -- > Regards, > Alexander > -- Regards, Alexander