Per the sql plan this is where it is failing - Attribute(s) with the same name appear in the operation: fnlwgt_bucketed. Please check if the right attribute(s) are used.;
On Sat, May 26, 2018 at 6:16 PM, Aakash Basu <[email protected]> wrote: > Hi, > > This query is based on one step further from the query in this link > <https://stackoverflow.com/questions/50530679/spark-2-3-asynceventqueue-error-and-warning>. > In this scenario, I add 1 or 2 more columns to be processed, Spark throws > an ERROR by printing the physical plan of queries. > > It says, *Resolved attribute(s) fnlwgt_bucketed#152530 missing* which is > untrue, as if I run the same code on less than 3 columns where this is one > column, it works like a charm, so I can clearly assume it is not a bug in > my query or code. > > Is it then a out of memory error? As I think, internally, since there are > many registered tables on memory, they're getting deleted due to overflow > of data and getting deleted, this is totally my assumption. Any insight on > this? Did anyone of you face any issue like this? > > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.: > org.apache.spark.sql.AnalysisException: Resolved attribute(s) > fnlwgt_bucketed#152530 missing from > occupation#17,high_income#25,fnlwgt#13,education#14,marital-status#16,relationship#18,workclass#12,sex#20,id_num#10,native_country#24,race#19,education-num#15,hours-per-week#23,age_bucketed#152432,capital-loss#22,age#11,capital-gain#21,fnlwgt_bucketed#99009 > in operator !Project [id_num#10, age#11, workclass#12, fnlwgt#13, > education#14, education-num#15, marital-status#16, occupation#17, > relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, > hours-per-week#23, native_country#24, high_income#25, age_bucketed#152432, > fnlwgt_bucketed#152530, if (isnull(cast(hours-per-week#23 as double))) null > else if (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else > UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS > hours-per-week_bucketed#152299]. Attribute(s) with the same name appear in > the operation: fnlwgt_bucketed. Please check if the right attribute(s) are > used.;;Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, > education-num#15, marital-status#16, occupation#17, relationship#18, race#19, > sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, > native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, > hours-per-week_bucketed#152299, age_bucketed_WoE#152431, WoE#152524 AS > fnlwgt_bucketed_WoE#152529]+- Join Inner, (fnlwgt_bucketed#99009 = > fnlwgt_bucketed#152530) > :- SubqueryAlias bucketed > : +- SubqueryAlias a > : +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, > education#14, education-num#15, marital-status#16, occupation#17, > relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, > hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, > fnlwgt_bucketed#99009, hours-per-week_bucketed#152299, WoE#152426 AS > age_bucketed_WoE#152431] > : +- Join Inner, (age_bucketed#48257 = age_bucketed#152432) > : :- SubqueryAlias bucketed > : : +- SubqueryAlias a > : : +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, > education#14, education-num#15, marital-status#16, occupation#17, > relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, > hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, > fnlwgt_bucketed#99009, if (isnull(cast(hours-per-week#23 as double))) null > else if (isnull(cast(hours-per-week#23 as double))) null else if > (isnull(cast(hours-per-week#23 as double))) null else > UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS > hours-per-week_bucketed#152299] > : : +- Project [id_num#10, age#11, workclass#12, > fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, > relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, > hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, if > (isnull(cast(fnlwgt#13 as double))) null else if (isnull(cast(fnlwgt#13 as > double))) null else if (isnull(cast(fnlwgt#13 as double))) null else > UDF:bucketizer_0(cast(fnlwgt#13 as double)) AS fnlwgt_bucketed#99009] > : : +- Project [id_num#10, age#11, workclass#12, > fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, > relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, > hours-per-week#23, native_country#24, high_income#25, if (isnull(cast(age#11 > as double))) null else if (isnull(cast(age#11 as double))) null else if > (isnull(cast(age#11 as double))) null else UDF:bucketizer_0(cast(age#11 as > double)) AS age_bucketed#48257] > : : +- > Relation[id_num#10,age#11,workclass#12,fnlwgt#13,education#14,education-num#15,marital-status#16,occupation#17,relationship#18,race#19,sex#20,capital-gain#21,capital-loss#22,hours-per-week#23,native_country#24,high_income#25] > csv > : +- SubqueryAlias woe_table > > > Whichever column I keep in the second position in the column list being > queried in loop. is throwing this error. Is it my laptop's memory issue? > > Thanks, > Aakash. >
