can you provide whats your data and what you want it to look like ?
On Fri, Jun 14, 2013 at 12:31 PM, Hamza Asad <hamza.asa...@gmail.com> wrote: > which UDF? it does not take to_date(event_date) column > > > On Fri, Jun 14, 2013 at 11:54 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > >> use already existing UDFs to split or transform your values the way you >> want >> >> >> On Fri, Jun 14, 2013 at 12:09 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: >> >>> OIC. I got it. Thanx alot nitin :). One more thing i want to ask related >>> this issue, if old table contains event_date in format "2012-06-24 >>> 06:04:11.9" then how can i partition it according to date part only? As >>> partition column does not accepts to_date(event_date) form. >>> >>> >>> On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: >>> >>>> If the input column value is NULL or empty string, the row will be put >>>> into a special partition, whose name is controlled by the hive parameter >>>> hive.exec.default.dynamic.partition.name. The default value is >>>> `__HIVE_DEFAULT_PARTITION__`. Basically this partition will contain all >>>> "bad" rows whose value are not valid partition names. >>>> >>>> so basically you do following things >>>> >>>> when you create a partitioned table, your partitioned column is normally >>>> at the end of the table, so when you are inserting data into this >>>> partitioned table, I would recommend using the column names in place >>>> select * from >>>> >>>> so your insert query should look like >>>> >>>> set hive.exec.dynamic.partition=true; >>>> >>>> >>>> >>>> set hive.exec.dynamic.partition.mode=nonstrict; >>>> >>>> >>>> >>>> >>>> insert overwrite table new_table partition(event_date) select col1, col2 >>>> .... coln, event_date from old_table; >>>> >>>> >>>> >>>> On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: >>>> >>>>> when i browse it in browser, all the data is in * >>>>> event_date=__HIVE_DEFAULT_PARTITION__<http://10.0.0.14:50075/browseDirectory.jsp?dir=%2Fvar%2Flog%2Fpring%2Fhive%2Fwarehouse%2Fnydus.db%2Fnew_rc_partition_cluster_table%2Fevent_date%3D__HIVE_DEFAULT_PARTITION__&namenodeInfoPort=50070> >>>>> *, rest of the files does not contains data >>>>> >>>>> >>>>> On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar >>>>> <nitinpawar...@gmail.com>wrote: >>>>> >>>>>> what do you mean when you say "it wont split correctly" ? >>>>>> >>>>>> >>>>>> On Thu, Jun 13, 2013 at 5:19 PM, Hamza Asad >>>>>> <hamza.asa...@gmail.com>wrote: >>>>>> >>>>>>> what if i have data of more then 500 days then how can i create >>>>>>> partition on date column by specifying each and every date? (i knw that >>>>>>> does not happens in dynamic partition but on dynamic partition, it wont >>>>>>> splits correctly). >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar < >>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>> >>>>>>>> you can partition existing table unless the hdfs data is laid out >>>>>>>> in partitioned fashion. >>>>>>>> your best bet is create a new partitioned table >>>>>>>> enable dynamic paritionining >>>>>>>> read from old table and write into new table >>>>>>>> >>>>>>>> you can verify the new partitions by using command "show partitions" >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad <hamza.asa...@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> now i created partition table like >>>>>>>>> *CREATE TABLE new_rc_partition_cluster_table( >>>>>>>>> >>>>>>>>> id int, >>>>>>>>> event_id int, >>>>>>>>> user_id BIGINT, >>>>>>>>> >>>>>>>>> intval_1 int , >>>>>>>>> intval_2 int, >>>>>>>>> intval_3 int, >>>>>>>>> intval_4 int, >>>>>>>>> intval_5 int, >>>>>>>>> intval_6 int, >>>>>>>>> intval_7 int, >>>>>>>>> intval_8 int, >>>>>>>>> intval_9 int, >>>>>>>>> intval_10 int, >>>>>>>>> intval_11 int, >>>>>>>>> intval_12 int, >>>>>>>>> intval_13 int, >>>>>>>>> intval_14 int, >>>>>>>>> intval_15 int, >>>>>>>>> intval_16 int, >>>>>>>>> intval_17 int, >>>>>>>>> intval_18 int, >>>>>>>>> intval_19 int, >>>>>>>>> intval_20 int, >>>>>>>>> intval_21 int, >>>>>>>>> intval_22 int, >>>>>>>>> intval_23 int, >>>>>>>>> intval_24 int, >>>>>>>>> intval_25 int, >>>>>>>>> intval_26 int) >>>>>>>>> PARTITIONED BY (event_date string) >>>>>>>>> >>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>> STORED AS RCFile; >>>>>>>>> * >>>>>>>>> >>>>>>>>> rest of the commands are same. But this time INSERT OVERWRITE >>>>>>>>> query executed and took time but when i queries from that table, it >>>>>>>>> results >>>>>>>>> none as it does not contains data. why is this so? and also please >>>>>>>>> tell me >>>>>>>>> how can i partition my existing table dynamicaly on date so that data >>>>>>>>> splits equally without mentioning date explicitly? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 12, 2013 at 6:50 PM, Nitin Pawar < >>>>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> you did not create partitioned table. You just created a bucketed >>>>>>>>>> table. >>>>>>>>>> >>>>>>>>>> refer to partitioned table created >>>>>>>>>> something like >>>>>>>>>> partitioned by (event_date string) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 12, 2013 at 7:17 PM, Hamza Asad < >>>>>>>>>> hamza.asa...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> i have created table after enabling dynamic partition. i >>>>>>>>>>> partitioned it on date but it is not splitting data datewise. Below >>>>>>>>>>> is the >>>>>>>>>>> query of table created and data insert >>>>>>>>>>> CREATE TABLE rc_partition_cluster_table( >>>>>>>>>>> id int, >>>>>>>>>>> event_id int, >>>>>>>>>>> user_id BIGINT, >>>>>>>>>>> event_date string, >>>>>>>>>>> intval_1 int ) >>>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>>>> STORED AS RCFile; >>>>>>>>>>> >>>>>>>>>>> set hive.exec.dynamic.partition=true; >>>>>>>>>>> set hive.exec.dynamic.partition.mode=nonstrict; >>>>>>>>>>> set hive.exec.max.dynamic.partitions=1000; >>>>>>>>>>> set hive.exec.max.dynamic.partitions.pernode=1000; >>>>>>>>>>> >>>>>>>>>>> INSERT OVERWRITE TABLE rc_partition_cluster_table Partition >>>>>>>>>>> (event_date) >>>>>>>>>>> SELECT * FROM events_details; >>>>>>>>>>> >>>>>>>>>>> why it is not working fine? >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nitin Pawar >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nitin Pawar >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Muhammad Hamza Asad* >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Muhammad Hamza Asad* >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >>> >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar