sample row of my data is *591269735,1,1022,2012-06-24 11:08:10.9,null,2,null,null,null,null,null,null,null,null,12,null,null,2,null,null,null,null,4,1,null,null,null,null,null,null *
and i want to partition it according to date i.e *2012-06-24 *skiping the hh:mm:ss.ff part On Fri, Jun 14, 2013 at 12:27 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > can you provide whats your data and what you want it to look like ? > > > On Fri, Jun 14, 2013 at 12:31 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: > >> which UDF? it does not take to_date(event_date) column >> >> >> On Fri, Jun 14, 2013 at 11:54 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote: >> >>> use already existing UDFs to split or transform your values the way you >>> want >>> >>> >>> On Fri, Jun 14, 2013 at 12:09 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: >>> >>>> OIC. I got it. Thanx alot nitin :). One more thing i want to ask >>>> related this issue, if old table contains event_date in format "2012-06-24 >>>> 06:04:11.9" then how can i partition it according to date part only? As >>>> partition column does not accepts to_date(event_date) form. >>>> >>>> >>>> On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar >>>> <nitinpawar...@gmail.com>wrote: >>>> >>>>> If the input column value is NULL or empty string, the row will be put >>>>> into a special partition, whose name is controlled by the hive parameter >>>>> hive.exec.default.dynamic.partition.name. The default value is >>>>> `__HIVE_DEFAULT_PARTITION__`. Basically this partition will contain all >>>>> "bad" rows whose value are not valid partition names. >>>>> >>>>> so basically you do following things >>>>> >>>>> when you create a partitioned table, your partitioned column is normally >>>>> at the end of the table, so when you are inserting data into this >>>>> partitioned table, I would recommend using the column names in place >>>>> select * from >>>>> >>>>> so your insert query should look like >>>>> >>>>> set hive.exec.dynamic.partition=true; >>>>> >>>>> >>>>> >>>>> >>>>> set hive.exec.dynamic.partition.mode=nonstrict; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> insert overwrite table new_table partition(event_date) select col1, col2 >>>>> .... coln, event_date from old_table; >>>>> >>>>> >>>>> >>>>> On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad <hamza.asa...@gmail.com>wrote: >>>>> >>>>>> when i browse it in browser, all the data is in * >>>>>> event_date=__HIVE_DEFAULT_PARTITION__<http://10.0.0.14:50075/browseDirectory.jsp?dir=%2Fvar%2Flog%2Fpring%2Fhive%2Fwarehouse%2Fnydus.db%2Fnew_rc_partition_cluster_table%2Fevent_date%3D__HIVE_DEFAULT_PARTITION__&namenodeInfoPort=50070> >>>>>> *, rest of the files does not contains data >>>>>> >>>>>> >>>>>> On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar <nitinpawar...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> what do you mean when you say "it wont split correctly" ? >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 13, 2013 at 5:19 PM, Hamza Asad >>>>>>> <hamza.asa...@gmail.com>wrote: >>>>>>> >>>>>>>> what if i have data of more then 500 days then how can i create >>>>>>>> partition on date column by specifying each and every date? (i knw that >>>>>>>> does not happens in dynamic partition but on dynamic partition, it wont >>>>>>>> splits correctly). >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar < >>>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>>> >>>>>>>>> you can partition existing table unless the hdfs data is laid out >>>>>>>>> in partitioned fashion. >>>>>>>>> your best bet is create a new partitioned table >>>>>>>>> enable dynamic paritionining >>>>>>>>> read from old table and write into new table >>>>>>>>> >>>>>>>>> you can verify the new partitions by using command "show >>>>>>>>> partitions" >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad < >>>>>>>>> hamza.asa...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> now i created partition table like >>>>>>>>>> *CREATE TABLE new_rc_partition_cluster_table( >>>>>>>>>> >>>>>>>>>> id int, >>>>>>>>>> event_id int, >>>>>>>>>> user_id BIGINT, >>>>>>>>>> >>>>>>>>>> intval_1 int , >>>>>>>>>> intval_2 int, >>>>>>>>>> intval_3 int, >>>>>>>>>> intval_4 int, >>>>>>>>>> intval_5 int, >>>>>>>>>> intval_6 int, >>>>>>>>>> intval_7 int, >>>>>>>>>> intval_8 int, >>>>>>>>>> intval_9 int, >>>>>>>>>> intval_10 int, >>>>>>>>>> intval_11 int, >>>>>>>>>> intval_12 int, >>>>>>>>>> intval_13 int, >>>>>>>>>> intval_14 int, >>>>>>>>>> intval_15 int, >>>>>>>>>> intval_16 int, >>>>>>>>>> intval_17 int, >>>>>>>>>> intval_18 int, >>>>>>>>>> intval_19 int, >>>>>>>>>> intval_20 int, >>>>>>>>>> intval_21 int, >>>>>>>>>> intval_22 int, >>>>>>>>>> intval_23 int, >>>>>>>>>> intval_24 int, >>>>>>>>>> intval_25 int, >>>>>>>>>> intval_26 int) >>>>>>>>>> PARTITIONED BY (event_date string) >>>>>>>>>> >>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>>> STORED AS RCFile; >>>>>>>>>> * >>>>>>>>>> >>>>>>>>>> rest of the commands are same. But this time INSERT OVERWRITE >>>>>>>>>> query executed and took time but when i queries from that table, it >>>>>>>>>> results >>>>>>>>>> none as it does not contains data. why is this so? and also please >>>>>>>>>> tell me >>>>>>>>>> how can i partition my existing table dynamicaly on date so that data >>>>>>>>>> splits equally without mentioning date explicitly? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 12, 2013 at 6:50 PM, Nitin Pawar < >>>>>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> you did not create partitioned table. You just created a >>>>>>>>>>> bucketed table. >>>>>>>>>>> >>>>>>>>>>> refer to partitioned table created >>>>>>>>>>> something like >>>>>>>>>>> partitioned by (event_date string) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 12, 2013 at 7:17 PM, Hamza Asad < >>>>>>>>>>> hamza.asa...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> i have created table after enabling dynamic partition. i >>>>>>>>>>>> partitioned it on date but it is not splitting data datewise. >>>>>>>>>>>> Below is the >>>>>>>>>>>> query of table created and data insert >>>>>>>>>>>> CREATE TABLE rc_partition_cluster_table( >>>>>>>>>>>> id int, >>>>>>>>>>>> event_id int, >>>>>>>>>>>> user_id BIGINT, >>>>>>>>>>>> event_date string, >>>>>>>>>>>> intval_1 int ) >>>>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>>>>> STORED AS RCFile; >>>>>>>>>>>> >>>>>>>>>>>> set hive.exec.dynamic.partition=true; >>>>>>>>>>>> set hive.exec.dynamic.partition.mode=nonstrict; >>>>>>>>>>>> set hive.exec.max.dynamic.partitions=1000; >>>>>>>>>>>> set hive.exec.max.dynamic.partitions.pernode=1000; >>>>>>>>>>>> >>>>>>>>>>>> INSERT OVERWRITE TABLE rc_partition_cluster_table Partition >>>>>>>>>>>> (event_date) >>>>>>>>>>>> SELECT * FROM events_details; >>>>>>>>>>>> >>>>>>>>>>>> why it is not working fine? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nitin Pawar >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nitin Pawar >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Muhammad Hamza Asad* >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nitin Pawar >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Muhammad Hamza Asad* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nitin Pawar >>>>> >>>> >>>> >>>> >>>> -- >>>> *Muhammad Hamza Asad* >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad*