use already existing UDFs to split or transform your values the way you want
On Fri, Jun 14, 2013 at 12:09 PM, Hamza Asad <hamza.asa...@gmail.com> wrote: > OIC. I got it. Thanx alot nitin :). One more thing i want to ask related > this issue, if old table contains event_date in format "2012-06-24 > 06:04:11.9" then how can i partition it according to date part only? As > partition column does not accepts to_date(event_date) form. > > > On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > >> If the input column value is NULL or empty string, the row will be put into >> a special partition, whose name is controlled by the hive parameter >> hive.exec.default.dynamic.partition.name. The default value is >> `__HIVE_DEFAULT_PARTITION__`. Basically this partition will contain all >> "bad" rows whose value are not valid partition names. >> >> so basically you do following things >> >> when you create a partitioned table, your partitioned column is normally at >> the end of the table, so when you are inserting data into this partitioned >> table, I would recommend using the column names in place select * from >> >> so your insert query should look like >> >> set hive.exec.dynamic.partition=true; >> >> set hive.exec.dynamic.partition.mode=nonstrict; >> >> >> insert overwrite table new_table partition(event_date) select col1, col2 >> .... coln, event_date from old_table; >> >> >> >> On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: >> >>> when i browse it in browser, all the data is in * >>> event_date=__HIVE_DEFAULT_PARTITION__<http://10.0.0.14:50075/browseDirectory.jsp?dir=%2Fvar%2Flog%2Fpring%2Fhive%2Fwarehouse%2Fnydus.db%2Fnew_rc_partition_cluster_table%2Fevent_date%3D__HIVE_DEFAULT_PARTITION__&namenodeInfoPort=50070> >>> *, rest of the files does not contains data >>> >>> >>> On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: >>> >>>> what do you mean when you say "it wont split correctly" ? >>>> >>>> >>>> On Thu, Jun 13, 2013 at 5:19 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: >>>> >>>>> what if i have data of more then 500 days then how can i create >>>>> partition on date column by specifying each and every date? (i knw that >>>>> does not happens in dynamic partition but on dynamic partition, it wont >>>>> splits correctly). >>>>> >>>>> >>>>> On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar >>>>> <nitinpawar...@gmail.com>wrote: >>>>> >>>>>> you can partition existing table unless the hdfs data is laid out in >>>>>> partitioned fashion. >>>>>> your best bet is create a new partitioned table >>>>>> enable dynamic paritionining >>>>>> read from old table and write into new table >>>>>> >>>>>> you can verify the new partitions by using command "show partitions" >>>>>> >>>>>> >>>>>> On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad >>>>>> <hamza.asa...@gmail.com>wrote: >>>>>> >>>>>>> now i created partition table like >>>>>>> *CREATE TABLE new_rc_partition_cluster_table( >>>>>>> >>>>>>> id int, >>>>>>> event_id int, >>>>>>> user_id BIGINT, >>>>>>> >>>>>>> intval_1 int , >>>>>>> intval_2 int, >>>>>>> intval_3 int, >>>>>>> intval_4 int, >>>>>>> intval_5 int, >>>>>>> intval_6 int, >>>>>>> intval_7 int, >>>>>>> intval_8 int, >>>>>>> intval_9 int, >>>>>>> intval_10 int, >>>>>>> intval_11 int, >>>>>>> intval_12 int, >>>>>>> intval_13 int, >>>>>>> intval_14 int, >>>>>>> intval_15 int, >>>>>>> intval_16 int, >>>>>>> intval_17 int, >>>>>>> intval_18 int, >>>>>>> intval_19 int, >>>>>>> intval_20 int, >>>>>>> intval_21 int, >>>>>>> intval_22 int, >>>>>>> intval_23 int, >>>>>>> intval_24 int, >>>>>>> intval_25 int, >>>>>>> intval_26 int) >>>>>>> PARTITIONED BY (event_date string) >>>>>>> >>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>> ROW FORMAT DELIMITED >>>>>>> FIELDS TERMINATED BY ',' >>>>>>> STORED AS RCFile; >>>>>>> * >>>>>>> >>>>>>> rest of the commands are same. But this time INSERT OVERWRITE query >>>>>>> executed and took time but when i queries from that table, it results >>>>>>> none >>>>>>> as it does not contains data. why is this so? and also please tell me >>>>>>> how >>>>>>> can i partition my existing table dynamicaly on date so that data splits >>>>>>> equally without mentioning date explicitly? >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 12, 2013 at 6:50 PM, Nitin Pawar < >>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>> >>>>>>>> you did not create partitioned table. You just created a bucketed >>>>>>>> table. >>>>>>>> >>>>>>>> refer to partitioned table created >>>>>>>> something like >>>>>>>> partitioned by (event_date string) >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 12, 2013 at 7:17 PM, Hamza Asad <hamza.asa...@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> i have created table after enabling dynamic partition. i >>>>>>>>> partitioned it on date but it is not splitting data datewise. Below >>>>>>>>> is the >>>>>>>>> query of table created and data insert >>>>>>>>> CREATE TABLE rc_partition_cluster_table( >>>>>>>>> id int, >>>>>>>>> event_id int, >>>>>>>>> user_id BIGINT, >>>>>>>>> event_date string, >>>>>>>>> intval_1 int ) >>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>> STORED AS RCFile; >>>>>>>>> >>>>>>>>> set hive.exec.dynamic.partition=true; >>>>>>>>> set hive.exec.dynamic.partition.mode=nonstrict; >>>>>>>>> set hive.exec.max.dynamic.partitions=1000; >>>>>>>>> set hive.exec.max.dynamic.partitions.pernode=1000; >>>>>>>>> >>>>>>>>> INSERT OVERWRITE TABLE rc_partition_cluster_table Partition >>>>>>>>> (event_date) >>>>>>>>> SELECT * FROM events_details; >>>>>>>>> >>>>>>>>> why it is not working fine? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nitin Pawar >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Muhammad Hamza Asad* >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Muhammad Hamza Asad* >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >>> >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar