you can use 10 passes over the same dataset and build the data
On Fri, Nov 3, 2017 at 9:48 PM, Jean Georges Perrin <jper...@lumeris.com> wrote: > Write a UDF? > > On Oct 31, 2017, at 11:48, Aakash Basu <aakash.spark....@gmail.com> wrote: > > Hey all, > > Any help in the below please? > > Thanks, > Aakash. > > > ---------- Forwarded message ---------- > From: Aakash Basu <aakash.spark....@gmail.com> > Date: Tue, Oct 31, 2017 at 9:17 PM > Subject: Regarding column partitioning IDs and names as per hierarchical > level SparkSQL > To: user <user@spark.apache.org> > > > Hi all, > > I have to generate a table with Spark-SQL with the following columns - > > > Level One Id: VARCHAR(20) NULL > Level One Name: VARCHAR( 50) NOT NULL > Level Two Id: VARCHAR( 20) NULL > Level Two Name: VARCHAR(50) NULL > Level Thr ee Id: VARCHAR(20) NULL > Level Thr ee Name: VARCHAR(50) NULL > Level Four Id: VARCHAR(20) NULL > Level Four Name: VARCHAR( 50) NULL > Level Five Id: VARCHAR(20) NULL > Level Five Name: VARCHAR(50) NULL > Level Six Id: VARCHAR(20) NULL > Level Six Name: VARCHAR(50) NULL > Level Seven Id: VARCHAR( 20) NULL > Level Seven Name: VARCHAR(50) NULL > Level Eight Id: VARCHAR( 20) NULL > Level Eight Name: VARCHAR(50) NULL > Level Nine Id: VARCHAR(20) NULL > Level Nine Name: VARCHAR( 50) NULL > Level Ten Id: VARCHAR(20) NULL > Level Ten Name: VARCHAR(50) NULL > > My input source has these columns - > > > ID Description ParentID > 10 Great-Grandfather > 1010 Grandfather 10 > 101010 1. Father A 1010 > 101011 2. Father B 1010 > 101012 4. Father C 1010 > 101013 5. Father D 1010 > 101015 3. Father E 1010 > 101018 Father F 1010 > 101019 6. Father G 1010 > 101020 Father H 1010 > 101021 Father I 1010 > 101022 2A. Father J 1010 > 10101010 2. Father K 101010 > Like the above, I have ID till 20 digits, which means, I have 10 levels. > > I want to populate the ID and name itself along with all the parents till > the root for any particular level, which I am unable to create a concrete > logic for. > > Am using this way to fetch respecting levels and populate them in the > respective columns but not their parents - > > Present Logic -> > > FinalJoin_DF = spark.sql("select " > + "case when length(a.id)/2 = '1' then a.id > else ' ' end as level_one_id, " > + "case when length(a.id)/2 = '1' then a.desc else ' ' end as > level_one_name, " > + "case when length(a.id)/2 = '2' then a.id else ' ' end as > level_two_id, " > + "case when length(a.id)/2 = '2' then a.desc else ' ' end as > level_two_name, " > + "case when length(a.id)/2 = '3' then a.id > else ' ' end as level_three_id, " > + "case when length(a.id)/2 = '3' then a.desc > else ' ' end as level_three_name, " > + "case when length(a.id)/2 = '4' then a.id > else ' ' end as level_four_id, " > + "case when length(a.id)/2 = '4' then a.desc > else ' ' end as level_four_name, " > + "case when length(a.id)/2 = '5' then a.id > else ' ' end as level_five_id, " > + "case when length(a.id)/2 = '5' then a.desc > else ' ' end as level_five_name, " > + "case when length(a.id)/2 = '6' then a.id > else ' ' end as level_six_id, " > + "case when length(a.id)/2 = '6' then a.desc else ' ' end as > level_six_name, " > + "case when length(a.id)/2 = '7' then a.id else ' ' end as > level_seven_id, " > + "case when length(a.id)/2 = '7' then a.desc > else ' ' end as level_seven_name, " > + "case when length(a.id)/2 = '8' then a.id > else ' ' end as level_eight_id, " > + "case when length(a.id)/2 = '8' then a.desc else ' ' end as > level_eight_name, " > + "case when length(a.id)/2 = '9' then a.id > else ' ' end as level_nine_id, " > + "case when length(a.id)/2 = '9' then a.desc else ' ' end as > level_nine_name, " > + "case when length(a.id)/2 = '10' then a.id else ' ' end as > level_ten_id, " > + "case when length(a.id)/2 = '10' then a.desc > else ' ' end as level_ten_name " > + "from CategoryTempTable a") > > > Can someone help me in also populating all the parents levels in the > respective level ID and level name, please? > > > Thanks, > Aakash. > > > -- Best Regards, Ayan Guha