I'm posting again, as the tables are not showing up in the emails..

I have a dataframe with few dimensions, for example:

+---+---+---+-----+
|  i|  j|  k|total|
+---+---+---+-----+
|  3|  1|  1|    3|
|  3|  1|  2|    6|
|  3|  1|  3|    9|
|  3|  1|  4|   12|
|  3|  1|  5|   15|
|  3|  1|  6|   18|
|  3|  1|  7|   21|
|  3|  1|  8|   24|
|  3|  1|  9|   27|
|  3|  2|  1|    6|
|  3|  2|  2|   12|
|  3|  2|  3|   18|
|  3|  2|  4|   24|
|  3|  2|  5|   30|
|  3|  2|  6|   36|
|  3|  2|  7|   42|
|  3|  2|  8|   48|
|  3|  2|  9|   54|
|  3|  3|  1|    9|
|  3|  3|  2|   18|
+---+---+---+-----+

I want to build a cube on i,j,k,  and get a rank based on total per row (per
grouping)
so that when doing:
df.filter('i===3 && 'j===1).show
I will get 

+---+---+----+-----+----+
|  i|  j|   k|total|rank|
+---+---+----+-----+----+
|  3|  1|null|  135|   1|
|  3|  1|   0|    0|  10|
|  3|  1|   1|    3|   9|
|  3|  1|   2|    6|   8|
|  3|  1|   3|    9|   7|
|  3|  1|   4|   12|   6|
|  3|  1|   5|   15|   5|
|  3|  1|   6|   18|   4|
|  3|  1|   7|   21|   3|
|  3|  1|   8|   24|   2|
|  3|  1|   9|   27|   1|
+---+---+----+-----+----+



so basically, for any grouping combination, i need a separated dense rank
list   (i,j,k,  i,j, i,k,  i,  j,k, j, k)

Any ideas?

(in this example, total = i*j*k  )



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ranks-and-cubes-tp27338p27339.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to