Hi All, I trying to understand how row_number is applied In the below code, does spark store data in a dataframe and then perform row_number function or does it apply while reading from hive ?
from pyspark.sql import HiveContext hiveContext = HiveContext(sc) hiveContext.sql(" ( SELECT colunm1 ,column2,column3, ROW_NUMBER() OVER (ORDER BY columnname) AS RowNum FROM tablename ) Appreciate any guidance.