Fellow Sparkers, I'm trying to "flatten" my view of data within a DataFrame, and am having difficulties doing so. The DataFrame contains product information, which includes multiple levels of categories (primary, secondary, etc).
*Example Data (Raw):* *Name Level Category* Baked Code Food 1 Baked Code Seafood 2 Baked Code Fish 3 Hockey Stick Sports 1 Hockey Stick Hockey 2 Hockey Stick Equipment 3 *Desired Data:* *Name Category1 Category2 Category3* Baked Code Food Seafood Fish Hockey Stick Sports Hockey Equipment *Approach:* After parsing the "raw" information into two separate DataFrames (called *products *and *categories*) and registering them as a Spark SQL tables, I was attempting to perform the following query to flatten this all into the "desired data" (depicted above). products.registerTempTable("products") categories.registerTempTable("categories") val productList = sqlContext.sql( " SELECT p.Name, " + " c1.Description AS Category1, " + " c2.Description AS Category2, " + " c3.Description AS Category3 " + " FROM products AS p " + " JOIN categories AS c1 " + " ON c1.Name = p.Name AND c1.Level = '1' " " JOIN categories AS c2 " + " ON c2.Name = p.Name AND c2.Level = '2' " " JOIN categories AS c3 " + " ON c3.Name = p.Name AND c3.Level = '3' " *Issue:* I get an error when running my query above, because I am not able to JOIN the *categories* table more than once. Has anybody dealt with this type of use case before, and if so how did you achieve the desired behavior? Thank you in advance for your thoughts. Kevin