Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Pranav Agrawal
yes, issue is with array type only, I have confirmed that.
I exploded array to struct but still getting the same error,


*Exception in thread "main" org.apache.spark.sql.AnalysisException: Union
can only be performed on tables with the compatible column types.
struct
<>
struct
at the 21th column of the second table;;*

On Mon, Jun 4, 2018 at 2:55 PM, Jorge Machado  wrote:

> Have you tryed to narrow down the problem so that we can be 100% sure that
> it lies on the array types ? Just exclude them for sake of testing.
> If we know 100% that it is on this array stuff try to explode that columns
> into simple types.
>
> Jorge Machado
>
>
>
>
>
>
> On 4 Jun 2018, at 11:09, Pranav Agrawal  wrote:
>
> I am ordering the columns before doing union, so I think that should not
> be an issue,
>
>
>
>
>
>
>
>
>
>
> * String[] columns_original_order = baseDs.columns();
> String[] columns = baseDs.columns();Arrays.sort(columns);
> baseDs=baseDs.selectExpr(columns);
> incDsForPartition=incDsForPartition.selectExpr(columns);if
> (baseDs.count() > 0) {return
> baseDs.union(incDsForPartition).selectExpr(columns_original_order);
> } else {return
> incDsForPartition.selectExpr(columns_original_order);*
>
>
> On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado  wrote:
>
>> Try the same union with a dataframe without the arrays types. Could be
>> something strange there like ordering or so.
>>
>> Jorge Machado
>>
>>
>>
>>
>>
>> On 4 Jun 2018, at 10:17, Pranav Agrawal  wrote:
>>
>> schema is exactly the same, not sure why it is failing though.
>>
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- id: integer (nullable = true)
>>  |||-- booking_id: integer (nullable = true)
>>  |||-- shifting_status: integer (nullable = true)
>>  |||-- shifting_reason: integer (nullable = true)
>>  |||-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- ticket_source: integer (nullable = true)
>>  |||-- ticket_status: string (nullable = true)
>>  |||-- ticket_instance_source: integer (nullable = true)
>>  |||-- ticket_category: string (nullable = true)
>>  |-- updated_at: timestamp (nullable = true)
>>  |-- year: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>>
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- id: integer (nullable = true)
>>  |||-- booking_id: integer (nullable = true)
>>  |||-- shifting_status: integer (nullable = true)
>>  |||-- shifting_reason: integer (nullable = true)
>>  |||-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  |   

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Jorge Machado
Have you tryed to narrow down the problem so that we can be 100% sure that it 
lies on the array types ? Just exclude them for sake of testing. 
If we know 100% that it is on this array stuff try to explode that columns into 
simple types.

Jorge Machado






> On 4 Jun 2018, at 11:09, Pranav Agrawal  wrote:
> 
> I am ordering the columns before doing union, so I think that should not be 
> an issue,
> 
> String[] columns_original_order = baseDs.columns();
> String[] columns = baseDs.columns();
> Arrays.sort(columns);
> baseDs=baseDs.selectExpr(columns);
> incDsForPartition=incDsForPartition.selectExpr(columns);
> 
> if (baseDs.count() > 0) {
> return 
> baseDs.union(incDsForPartition).selectExpr(columns_original_order);
> } else {
> return incDsForPartition.selectExpr(columns_original_order);
> 
> 
> On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado  > wrote:
> Try the same union with a dataframe without the arrays types. Could be 
> something strange there like ordering or so.
> 
> Jorge Machado
> 
> 
> 
> 
> 
>> On 4 Jun 2018, at 10:17, Pranav Agrawal > > wrote:
>> 
>> schema is exactly the same, not sure why it is failing though.
>> 
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- id: integer (nullable = true)
>>  |||-- booking_id: integer (nullable = true)
>>  |||-- shifting_status: integer (nullable = true)
>>  |||-- shifting_reason: integer (nullable = true)
>>  |||-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- ticket_source: integer (nullable = true)
>>  |||-- ticket_status: string (nullable = true)
>>  |||-- ticket_instance_source: integer (nullable = true)
>>  |||-- ticket_category: string (nullable = true)
>>  |-- updated_at: timestamp (nullable = true)
>>  |-- year: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>> 
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- id: integer (nullable = true)
>>  |||-- booking_id: integer (nullable = true)
>>  |||-- shifting_status: integer (nullable = true)
>>  |||-- shifting_reason: integer (nullable = true)
>>  |||-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  ||-- element: struct (containsNull = true)
>>  |||-- ticket_source: integer (nullable = true)
>>  |||-- ticket_status: string (nullable = true)
>>  |||-- ticket_instance_source: integer (nullable = true)
>>  |||-- ticket_category: string (nullable = true)
>>  |-- 

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Pranav Agrawal
I am ordering the columns before doing union, so I think that should not be
an issue,










* String[] columns_original_order = baseDs.columns();
String[] columns = baseDs.columns();Arrays.sort(columns);
baseDs=baseDs.selectExpr(columns);
incDsForPartition=incDsForPartition.selectExpr(columns);if
(baseDs.count() > 0) {return
baseDs.union(incDsForPartition).selectExpr(columns_original_order);
} else {return
incDsForPartition.selectExpr(columns_original_order);*


On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado  wrote:

> Try the same union with a dataframe without the arrays types. Could be
> something strange there like ordering or so.
>
> Jorge Machado
>
>
>
>
>
> On 4 Jun 2018, at 10:17, Pranav Agrawal  wrote:
>
> schema is exactly the same, not sure why it is failing though.
>
> root
>  |-- booking_id: integer (nullable = true)
>  |-- booking_rooms_room_category_id: integer (nullable = true)
>  |-- booking_rooms_room_id: integer (nullable = true)
>  |-- booking_source: integer (nullable = true)
>  |-- booking_status: integer (nullable = true)
>  |-- cancellation_reason: integer (nullable = true)
>  |-- checkin: string (nullable = true)
>  |-- checkout: string (nullable = true)
>  |-- city_id: integer (nullable = true)
>  |-- cluster_id: integer (nullable = true)
>  |-- company_id: integer (nullable = true)
>  |-- created_at: string (nullable = true)
>  |-- discount: integer (nullable = true)
>  |-- feedback_created_at: string (nullable = true)
>  |-- feedback_id: integer (nullable = true)
>  |-- hotel_id: integer (nullable = true)
>  |-- hub_id: integer (nullable = true)
>  |-- month: integer (nullable = true)
>  |-- no_show_reason: integer (nullable = true)
>  |-- oyo_rooms: integer (nullable = true)
>  |-- selling_amount: integer (nullable = true)
>  |-- shifting: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- id: integer (nullable = true)
>  |||-- booking_id: integer (nullable = true)
>  |||-- shifting_status: integer (nullable = true)
>  |||-- shifting_reason: integer (nullable = true)
>  |||-- shifting_metadata: integer (nullable = true)
>  |-- suggest_oyo: integer (nullable = true)
>  |-- tickets: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- ticket_source: integer (nullable = true)
>  |||-- ticket_status: string (nullable = true)
>  |||-- ticket_instance_source: integer (nullable = true)
>  |||-- ticket_category: string (nullable = true)
>  |-- updated_at: timestamp (nullable = true)
>  |-- year: integer (nullable = true)
>  |-- zone_id: integer (nullable = true)
>
> root
>  |-- booking_id: integer (nullable = true)
>  |-- booking_rooms_room_category_id: integer (nullable = true)
>  |-- booking_rooms_room_id: integer (nullable = true)
>  |-- booking_source: integer (nullable = true)
>  |-- booking_status: integer (nullable = true)
>  |-- cancellation_reason: integer (nullable = true)
>  |-- checkin: string (nullable = true)
>  |-- checkout: string (nullable = true)
>  |-- city_id: integer (nullable = true)
>  |-- cluster_id: integer (nullable = true)
>  |-- company_id: integer (nullable = true)
>  |-- created_at: string (nullable = true)
>  |-- discount: integer (nullable = true)
>  |-- feedback_created_at: string (nullable = true)
>  |-- feedback_id: integer (nullable = true)
>  |-- hotel_id: integer (nullable = true)
>  |-- hub_id: integer (nullable = true)
>  |-- month: integer (nullable = true)
>  |-- no_show_reason: integer (nullable = true)
>  |-- oyo_rooms: integer (nullable = true)
>  |-- selling_amount: integer (nullable = true)
>  |-- shifting: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- id: integer (nullable = true)
>  |||-- booking_id: integer (nullable = true)
>  |||-- shifting_status: integer (nullable = true)
>  |||-- shifting_reason: integer (nullable = true)
>  |||-- shifting_metadata: integer (nullable = true)
>  |-- suggest_oyo: integer (nullable = true)
>  |-- tickets: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- ticket_source: integer (nullable = true)
>  |||-- ticket_status: string (nullable = true)
>  |||-- ticket_instance_source: integer (nullable = true)
>  |||-- ticket_category: string (nullable = true)
>  |-- updated_at: timestamp (nullable = false)
>  |-- year: integer (nullable = true)
>  |-- zone_id: integer (nullable = true)
>
> On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
>> Hi Pranav,
>> I don´t have an answer to your issue, but what I generally do in this
>> cases is to first try to simplify it to a point where it is easier to check
>> what´s going on, and then adding back ¨pieces¨ one by one until I spot the
>> error.
>>
>> In your case I can suggest to:
>>
>> 1) project 

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Jorge Machado
Try the same union with a dataframe without the arrays types. Could be 
something strange there like ordering or so.

Jorge Machado





> On 4 Jun 2018, at 10:17, Pranav Agrawal  wrote:
> 
> schema is exactly the same, not sure why it is failing though.
> 
> root
>  |-- booking_id: integer (nullable = true)
>  |-- booking_rooms_room_category_id: integer (nullable = true)
>  |-- booking_rooms_room_id: integer (nullable = true)
>  |-- booking_source: integer (nullable = true)
>  |-- booking_status: integer (nullable = true)
>  |-- cancellation_reason: integer (nullable = true)
>  |-- checkin: string (nullable = true)
>  |-- checkout: string (nullable = true)
>  |-- city_id: integer (nullable = true)
>  |-- cluster_id: integer (nullable = true)
>  |-- company_id: integer (nullable = true)
>  |-- created_at: string (nullable = true)
>  |-- discount: integer (nullable = true)
>  |-- feedback_created_at: string (nullable = true)
>  |-- feedback_id: integer (nullable = true)
>  |-- hotel_id: integer (nullable = true)
>  |-- hub_id: integer (nullable = true)
>  |-- month: integer (nullable = true)
>  |-- no_show_reason: integer (nullable = true)
>  |-- oyo_rooms: integer (nullable = true)
>  |-- selling_amount: integer (nullable = true)
>  |-- shifting: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- id: integer (nullable = true)
>  |||-- booking_id: integer (nullable = true)
>  |||-- shifting_status: integer (nullable = true)
>  |||-- shifting_reason: integer (nullable = true)
>  |||-- shifting_metadata: integer (nullable = true)
>  |-- suggest_oyo: integer (nullable = true)
>  |-- tickets: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- ticket_source: integer (nullable = true)
>  |||-- ticket_status: string (nullable = true)
>  |||-- ticket_instance_source: integer (nullable = true)
>  |||-- ticket_category: string (nullable = true)
>  |-- updated_at: timestamp (nullable = true)
>  |-- year: integer (nullable = true)
>  |-- zone_id: integer (nullable = true)
> 
> root
>  |-- booking_id: integer (nullable = true)
>  |-- booking_rooms_room_category_id: integer (nullable = true)
>  |-- booking_rooms_room_id: integer (nullable = true)
>  |-- booking_source: integer (nullable = true)
>  |-- booking_status: integer (nullable = true)
>  |-- cancellation_reason: integer (nullable = true)
>  |-- checkin: string (nullable = true)
>  |-- checkout: string (nullable = true)
>  |-- city_id: integer (nullable = true)
>  |-- cluster_id: integer (nullable = true)
>  |-- company_id: integer (nullable = true)
>  |-- created_at: string (nullable = true)
>  |-- discount: integer (nullable = true)
>  |-- feedback_created_at: string (nullable = true)
>  |-- feedback_id: integer (nullable = true)
>  |-- hotel_id: integer (nullable = true)
>  |-- hub_id: integer (nullable = true)
>  |-- month: integer (nullable = true)
>  |-- no_show_reason: integer (nullable = true)
>  |-- oyo_rooms: integer (nullable = true)
>  |-- selling_amount: integer (nullable = true)
>  |-- shifting: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- id: integer (nullable = true)
>  |||-- booking_id: integer (nullable = true)
>  |||-- shifting_status: integer (nullable = true)
>  |||-- shifting_reason: integer (nullable = true)
>  |||-- shifting_metadata: integer (nullable = true)
>  |-- suggest_oyo: integer (nullable = true)
>  |-- tickets: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- ticket_source: integer (nullable = true)
>  |||-- ticket_status: string (nullable = true)
>  |||-- ticket_instance_source: integer (nullable = true)
>  |||-- ticket_category: string (nullable = true)
>  |-- updated_at: timestamp (nullable = false)
>  |-- year: integer (nullable = true)
>  |-- zone_id: integer (nullable = true)
> 
> On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando 
> mailto:alessandro.solima...@gmail.com>> 
> wrote:
> Hi Pranav,
> I don´t have an answer to your issue, but what I generally do in this cases 
> is to first try to simplify it to a point where it is easier to check what´s 
> going on, and then adding back ¨pieces¨ one by one until I spot the error.
> 
> In your case I can suggest to: 
> 
> 1) project the dataset to the problematic column only (column 21 from your 
> log)
> 2) use explode function to have one element of the array per line
> 3) flatten the struct 
> 
> At each step use printSchema() to double check if the types are as you expect 
> them to be, and if they are the same for both datasets.
> 
> Best regards,
> Alessandro 
> 
> On 2 June 2018 at 19:48, Pranav Agrawal  > wrote:
> can't get around this error when performing union of two datasets 
> (ds1.union(ds2)) having complex data type (struct, list),
> 
> 18/06/02 15:12:00 

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Pranav Agrawal
schema is exactly the same, not sure why it is failing though.

root
 |-- booking_id: integer (nullable = true)
 |-- booking_rooms_room_category_id: integer (nullable = true)
 |-- booking_rooms_room_id: integer (nullable = true)
 |-- booking_source: integer (nullable = true)
 |-- booking_status: integer (nullable = true)
 |-- cancellation_reason: integer (nullable = true)
 |-- checkin: string (nullable = true)
 |-- checkout: string (nullable = true)
 |-- city_id: integer (nullable = true)
 |-- cluster_id: integer (nullable = true)
 |-- company_id: integer (nullable = true)
 |-- created_at: string (nullable = true)
 |-- discount: integer (nullable = true)
 |-- feedback_created_at: string (nullable = true)
 |-- feedback_id: integer (nullable = true)
 |-- hotel_id: integer (nullable = true)
 |-- hub_id: integer (nullable = true)
 |-- month: integer (nullable = true)
 |-- no_show_reason: integer (nullable = true)
 |-- oyo_rooms: integer (nullable = true)
 |-- selling_amount: integer (nullable = true)
 |-- shifting: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- id: integer (nullable = true)
 |||-- booking_id: integer (nullable = true)
 |||-- shifting_status: integer (nullable = true)
 |||-- shifting_reason: integer (nullable = true)
 |||-- shifting_metadata: integer (nullable = true)
 |-- suggest_oyo: integer (nullable = true)
 |-- tickets: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- ticket_source: integer (nullable = true)
 |||-- ticket_status: string (nullable = true)
 |||-- ticket_instance_source: integer (nullable = true)
 |||-- ticket_category: string (nullable = true)
 |-- updated_at: timestamp (nullable = true)
 |-- year: integer (nullable = true)
 |-- zone_id: integer (nullable = true)

root
 |-- booking_id: integer (nullable = true)
 |-- booking_rooms_room_category_id: integer (nullable = true)
 |-- booking_rooms_room_id: integer (nullable = true)
 |-- booking_source: integer (nullable = true)
 |-- booking_status: integer (nullable = true)
 |-- cancellation_reason: integer (nullable = true)
 |-- checkin: string (nullable = true)
 |-- checkout: string (nullable = true)
 |-- city_id: integer (nullable = true)
 |-- cluster_id: integer (nullable = true)
 |-- company_id: integer (nullable = true)
 |-- created_at: string (nullable = true)
 |-- discount: integer (nullable = true)
 |-- feedback_created_at: string (nullable = true)
 |-- feedback_id: integer (nullable = true)
 |-- hotel_id: integer (nullable = true)
 |-- hub_id: integer (nullable = true)
 |-- month: integer (nullable = true)
 |-- no_show_reason: integer (nullable = true)
 |-- oyo_rooms: integer (nullable = true)
 |-- selling_amount: integer (nullable = true)
 |-- shifting: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- id: integer (nullable = true)
 |||-- booking_id: integer (nullable = true)
 |||-- shifting_status: integer (nullable = true)
 |||-- shifting_reason: integer (nullable = true)
 |||-- shifting_metadata: integer (nullable = true)
 |-- suggest_oyo: integer (nullable = true)
 |-- tickets: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- ticket_source: integer (nullable = true)
 |||-- ticket_status: string (nullable = true)
 |||-- ticket_instance_source: integer (nullable = true)
 |||-- ticket_category: string (nullable = true)
 |-- updated_at: timestamp (nullable = false)
 |-- year: integer (nullable = true)
 |-- zone_id: integer (nullable = true)

On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi Pranav,
> I don´t have an answer to your issue, but what I generally do in this
> cases is to first try to simplify it to a point where it is easier to check
> what´s going on, and then adding back ¨pieces¨ one by one until I spot the
> error.
>
> In your case I can suggest to:
>
> 1) project the dataset to the problematic column only (column 21 from your
> log)
> 2) use explode function to have one element of the array per line
> 3) flatten the struct
>
> At each step use printSchema() to double check if the types are as you
> expect them to be, and if they are the same for both datasets.
>
> Best regards,
> Alessandro
>
> On 2 June 2018 at 19:48, Pranav Agrawal  wrote:
>
>> can't get around this error when performing union of two datasets
>> (ds1.union(ds2)) having complex data type (struct, list),
>>
>>
>> *18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED,
>> exitCode: 15, (reason: User class threw exception:
>> org.apache.spark.sql.AnalysisException: Union can only be performed on
>> tables with the compatible column types.
>> array>
>> <>
>> array>
>> at the 21th column of the second table;;*
>> As far as I can tell, they are the same. What am I doing wrong? Any help
>> / workaround appreciated!
>>
>> 

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-03 Thread Alessandro Solimando
Hi Pranav,
I don´t have an answer to your issue, but what I generally do in this cases
is to first try to simplify it to a point where it is easier to check
what´s going on, and then adding back ¨pieces¨ one by one until I spot the
error.

In your case I can suggest to:

1) project the dataset to the problematic column only (column 21 from your
log)
2) use explode function to have one element of the array per line
3) flatten the struct

At each step use printSchema() to double check if the types are as you
expect them to be, and if they are the same for both datasets.

Best regards,
Alessandro

On 2 June 2018 at 19:48, Pranav Agrawal  wrote:

> can't get around this error when performing union of two datasets
> (ds1.union(ds2)) having complex data type (struct, list),
>
>
> *18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED,
> exitCode: 15, (reason: User class threw exception:
> org.apache.spark.sql.AnalysisException: Union can only be performed on
> tables with the compatible column types.
> array>
> <>
> array>
> at the 21th column of the second table;;*
> As far as I can tell, they are the same. What am I doing wrong? Any help /
> workaround appreciated!
>
> spark version: 2.2.1
>
> Thanks,
> Pranav
>