Re: Save to a Partitioned Table using a Derived Column

Mich Talebzadeh Fri, 03 Jun 2016 10:34:41 -0700

what version of spark are you using

Dr Mich Talebzadeh




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 3 June 2016 at 17:51, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

> ok what is the new column is called? you are basically adding a new column
> to an already existing table
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> The table already exists.
>>
>>  CREATE EXTERNAL TABLE `amo_bi_events`(
>>    `event_type` string COMMENT '',
>>
>>    `timestamp` string COMMENT '',
>>
>>    `event_valid` int COMMENT '',
>>
>>    `event_subtype` string COMMENT '',
>>
>>    `user_ip` string COMMENT '',
>>
>>    `user_id` string COMMENT '',
>>
>>    `cookie_status` string COMMENT '',
>>
>>    `profile_status` string COMMENT '',
>>
>>    `user_status` string COMMENT '',
>>
>>    `previous_timestamp` string COMMENT '',
>>
>>    `user_agent` string COMMENT '',
>>
>>    `referer` string COMMENT '',
>>
>>    `uri` string COMMENT '',
>>
>>    `request_elapsed` bigint COMMENT '',
>>
>>    `browser_languages` string COMMENT '',
>>
>>    `acamp_id` int COMMENT '',
>>
>>    `creative_id` int COMMENT '',
>>
>>    `location_id` int COMMENT '',
>>
>>    `pcamp_id` int COMMENT '',
>>
>>    `pdomain_id` int COMMENT '',
>>
>>    `country` string COMMENT '',
>>
>>    `region` string COMMENT '',
>>
>>    `dma` int COMMENT '',
>>
>>    `city` string COMMENT '',
>>
>>    `zip` string COMMENT '',
>>
>>    `isp` string COMMENT '',
>>
>>    `line_speed` string COMMENT '',
>>
>>    `gender` string COMMENT '',
>>
>>    `year_of_birth` int COMMENT '',
>>
>>    `behaviors_read` string COMMENT '',
>>
>>    `behaviors_written` string COMMENT '',
>>
>>    `key_value_pairs` string COMMENT '',
>>
>>    `acamp_candidates` int COMMENT '',
>>
>>    `tag_format` string COMMENT '',
>>
>>    `optimizer_name` string COMMENT '',
>>
>>    `optimizer_version` string COMMENT '',
>>
>>    `optimizer_ip` string COMMENT '',
>>
>>    `pixel_id` int COMMENT '',
>>
>>    `video_id` string COMMENT '',
>>
>>    `video_network_id` int COMMENT '',
>>
>>    `video_time_watched` bigint COMMENT '',
>>
>>    `video_percentage_watched` int COMMENT '',
>>
>>    `conversion_valid_sale` int COMMENT '',
>>
>>    `conversion_sale_amount` float COMMENT '',
>>
>>    `conversion_commission_amount` float COMMENT '',
>>
>>    `conversion_step` int COMMENT '',
>>
>>    `conversion_currency` string COMMENT '',
>>
>>    `conversion_attribution` int COMMENT '',
>>
>>    `conversion_offer_id` string COMMENT '',
>>
>>    `custom_info` string COMMENT '',
>>
>>    `frequency` int COMMENT '',
>>
>>    `recency_seconds` int COMMENT '',
>>
>>    `cost` float COMMENT '',
>>
>>    `revenue` float COMMENT '',
>>
>>    `optimizer_acamp_id` int COMMENT '',
>>
>>    `optimizer_creative_id` int COMMENT '',
>>
>>    `optimizer_ecpm` float COMMENT '',
>>
>>    `event_id` string COMMENT '',
>>
>>    `impression_id` string COMMENT '',
>>
>>    `diagnostic_data` string COMMENT '',
>>
>>    `user_profile_mapping_source` string COMMENT '',
>>
>>    `latitude` float COMMENT '',
>>
>>    `longitude` float COMMENT '',
>>
>>    `area_code` int COMMENT '',
>>
>>    `gmt_offset` string COMMENT '',
>>
>>    `in_dst` string COMMENT '',
>>
>>    `proxy_type` string COMMENT '',
>>
>>    `mobile_carrier` string COMMENT '',
>>
>>    `pop` string COMMENT '',
>>
>>    `hostname` string COMMENT '',
>>
>>    `profile_ttl` string COMMENT '',
>>
>>    `timestamp_iso` string COMMENT '',
>>
>>    `reference_id` string COMMENT '',
>>
>>    `identity_organization` string COMMENT '',
>>
>>    `identity_method` string COMMENT '',
>>
>>    `mappable_id` string COMMENT '',
>>
>>    `profile_expires` string COMMENT '',
>>
>>    `video_player_iframed` int COMMENT '',
>>
>>    `video_player_in_view` int COMMENT '',
>>
>>    `video_player_width` int COMMENT '',
>>
>>    `video_player_height` int COMMENT '',
>>
>>    `host_domain` string COMMENT '',
>>
>>    `browser_type` string COMMENT '',
>>
>>    `browser_device_cat` string COMMENT '',
>>
>>    `browser_family` string COMMENT '',
>>
>>    `browser_name` string COMMENT '',
>>
>>    `browser_version` string COMMENT '',
>>
>>    `browser_major_version` string COMMENT '',
>>
>>    `browser_minor_version` string COMMENT '',
>>
>>    `os_family` string COMMENT '',
>>
>>    `os_name` string COMMENT '',
>>
>>    `os_version` string COMMENT '',
>>
>>    `os_major_version` string COMMENT '',
>>
>>    `os_minor_version` string COMMENT '')
>>
>>  PARTITIONED BY (`dt` timestamp)
>>
>>  STORED AS PARQUET;
>>
>> Thanks,
>> Ben
>>
>>
>> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> hang on are you saving this as a new table?
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Does anyone know how to save data in a DataFrame to a table partitioned
>>> using an existing column reformatted into a derived column?
>>>
>>>                 val partitionedDf = df.withColumn("dt",
>>> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp",
>>> 12, 2), lit(":00")))
>>>
>>>                 sqlContext.setConf("hive.exec.dynamic.partition", "true")
>>>                 sqlContext.setConf("hive.exec.dynamic.partition.mode",
>>> "nonstrict")
>>>                 partitionedDf.write
>>>                     .mode(SaveMode.Append)
>>>                     .partitionBy("dt")
>>>                     .saveAsTable("ds.amo_bi_events")
>>>
>>> I am getting an ArrayOutOfBounds error. There are 83 columns in the
>>> destination table. But after adding the derived column, then I get an 84
>>> error. I assumed that the column used for the partition would not be
>>> counted.
>>>
>>> Can someone please help.
>>>
>>> Thanks,
>>> Ben
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>

Re: Save to a Partitioned Table using a Derived Column

Reply via email to