Re: looking for an easy to to find the max value of a column in a data frame

Andy Davidson Tue, 29 Mar 2016 09:59:51 -0700

Hi Alexander

Many thanks. I think the key was I needed to import that max function. Turns
out you do not need to use col
Df.select(max(³foo²)).show()


To get the actual value of max you still need to write more code than I
would expect. I wonder if there is a easier way to work with Rows?

In [19]:
from pyspark.sql.functions import max
maxRow = idDF2.select(max("col[id]")).collect()
max = maxRow[0].asDict()['max(col[id])']
max
Out[19]:
713912692155621376

From:  Alexander Krasnukhin <[email protected]>
Date:  Monday, March 28, 2016 at 5:55 PM
To:  Andrew Davidson <[email protected]>
Cc:  "user @spark" <[email protected]>
Subject:  Re: looking for an easy to to find the max value of a column in a
data frame

> e.g. select max value for column "foo":
> 
> from pyspark.sql.functions import max, col
> df.select(max(col("foo"))).show()
> 
> On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <[email protected]>
> wrote:
>> I am using pyspark 1.6.1 and python3.
>> 
>> 
>> Given:
>> 
>> idDF2 = idDF.select(idDF.id, idDF.col.id <http://idDF.col.id>  )
>> idDF2.printSchema()
>> idDF2.show()
>> root
>>  |-- id: string (nullable = true)
>>  |-- col[id]: long (nullable = true)
>> 
>> +----------+----------+
>> |        id|   col[id]|
>> +----------+----------+
>> |1008930924| 534494917|
>> |1008930924| 442237496|
>> |1008930924|  98069752|
>> |1008930924|2790311425|
>> |1008930924|3300869821|
>> 
>> 
>> I have to do a lot of work to get the max value
>> 
>> rows = idDF2.select("col[id]").describe().collect()
>> hack = [s for s in rows if s.summary == 'max']
>> print(hack)
>> print(hack[0].summary)
>> print(type(hack[0]))
>> print(hack[0].asDict()['col[id]'])
>> maxStr = hack[0].asDict()['col[id]']
>> ttt = int(maxStr)
>> numDimensions = 1 + ttt
>> print(numDimensions)
>> 
>> Is there an easier way?
>> 
>> Kind regards
>> 
>> Andy
> 
> 
> 
> -- 
> Regards,
> Alexander

Re: looking for an easy to to find the max value of a column in a data frame

Reply via email to