Per the sql plan this is where it is failing -

Attribute(s) with the same name appear in the operation:
fnlwgt_bucketed. Please check if the right attribute(s) are used.;



On Sat, May 26, 2018 at 6:16 PM, Aakash Basu <[email protected]>
wrote:

> Hi,
>
> This query is based on one step further from the query in this link
> <https://stackoverflow.com/questions/50530679/spark-2-3-asynceventqueue-error-and-warning>.
> In this scenario, I add 1 or 2 more columns to be processed, Spark throws
> an ERROR by printing the physical plan of queries.
>
> It says, *Resolved attribute(s) fnlwgt_bucketed#152530 missing* which is
> untrue, as if I run the same code on less than 3 columns where this is one
> column, it works like a charm, so I can clearly assume it is not a bug in
> my query or code.
>
> Is it then a out of memory error? As I think, internally, since there are
> many registered tables on memory, they're getting deleted due to overflow
> of data and getting deleted, this is totally my assumption. Any insight on
> this? Did anyone of you face any issue like this?
>
> py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.: 
> org.apache.spark.sql.AnalysisException: Resolved attribute(s) 
> fnlwgt_bucketed#152530 missing from 
> occupation#17,high_income#25,fnlwgt#13,education#14,marital-status#16,relationship#18,workclass#12,sex#20,id_num#10,native_country#24,race#19,education-num#15,hours-per-week#23,age_bucketed#152432,capital-loss#22,age#11,capital-gain#21,fnlwgt_bucketed#99009
>  in operator !Project [id_num#10, age#11, workclass#12, fnlwgt#13, 
> education#14, education-num#15, marital-status#16, occupation#17, 
> relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, 
> hours-per-week#23, native_country#24, high_income#25, age_bucketed#152432, 
> fnlwgt_bucketed#152530, if (isnull(cast(hours-per-week#23 as double))) null 
> else if (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else 
> UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS 
> hours-per-week_bucketed#152299]. Attribute(s) with the same name appear in 
> the operation: fnlwgt_bucketed. Please check if the right attribute(s) are 
> used.;;Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, 
> education-num#15, marital-status#16, occupation#17, relationship#18, race#19, 
> sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, 
> native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, 
> hours-per-week_bucketed#152299, age_bucketed_WoE#152431, WoE#152524 AS 
> fnlwgt_bucketed_WoE#152529]+- Join Inner, (fnlwgt_bucketed#99009 = 
> fnlwgt_bucketed#152530)
>    :- SubqueryAlias bucketed
>    :  +- SubqueryAlias a
>    :     +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, 
> education#14, education-num#15, marital-status#16, occupation#17, 
> relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, 
> hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, 
> fnlwgt_bucketed#99009, hours-per-week_bucketed#152299, WoE#152426 AS 
> age_bucketed_WoE#152431]
>    :        +- Join Inner, (age_bucketed#48257 = age_bucketed#152432)
>    :           :- SubqueryAlias bucketed
>    :           :  +- SubqueryAlias a
>    :           :     +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, 
> education#14, education-num#15, marital-status#16, occupation#17, 
> relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, 
> hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, 
> fnlwgt_bucketed#99009, if (isnull(cast(hours-per-week#23 as double))) null 
> else if (isnull(cast(hours-per-week#23 as double))) null else if 
> (isnull(cast(hours-per-week#23 as double))) null else 
> UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS 
> hours-per-week_bucketed#152299]
>    :           :        +- Project [id_num#10, age#11, workclass#12, 
> fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, 
> relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, 
> hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, if 
> (isnull(cast(fnlwgt#13 as double))) null else if (isnull(cast(fnlwgt#13 as 
> double))) null else if (isnull(cast(fnlwgt#13 as double))) null else 
> UDF:bucketizer_0(cast(fnlwgt#13 as double)) AS fnlwgt_bucketed#99009]
>    :           :           +- Project [id_num#10, age#11, workclass#12, 
> fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, 
> relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, 
> hours-per-week#23, native_country#24, high_income#25, if (isnull(cast(age#11 
> as double))) null else if (isnull(cast(age#11 as double))) null else if 
> (isnull(cast(age#11 as double))) null else UDF:bucketizer_0(cast(age#11 as 
> double)) AS age_bucketed#48257]
>    :           :              +- 
> Relation[id_num#10,age#11,workclass#12,fnlwgt#13,education#14,education-num#15,marital-status#16,occupation#17,relationship#18,race#19,sex#20,capital-gain#21,capital-loss#22,hours-per-week#23,native_country#24,high_income#25]
>  csv
>    :           +- SubqueryAlias woe_table
>
>
> Whichever column I keep in the second position in the column list being
> queried in loop. is throwing this error. Is it my laptop's memory issue?
>
> Thanks,
> Aakash.
>

Reply via email to