Hi Sangeetha, Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to HDFS. You have many options for choosing the SerDe for your table. For example, if your file contains tab delimited fields, you could use the default SerDe (by not specifying any SerDe) and specify the delimiter by using FIELDS TERMINATED BY '\t' in your create table statement.
If you desire, you could use the Regex SerDe (albeit, with some performance overhead) using something like: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = ".*time:([^,]*)", "output.format.string" = "time:%1$s") in your create table statement. As you get more familiar with Hive, you might find the need for writing your own UDF for parsing the data. Here is the link to the Hive wiki for Create Table: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropTable Here is the link for UDFs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Welcome and good luck! Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: [email protected] "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. ----- Original Message ----- From: "sangeetha k" <[email protected]> To: [email protected] Sent: Tuesday, December 6, 2011 4:26:03 AM Subject: Re: log4j format logs in Hive table Hi, Thanks for the response. Yes, You got my question. An example of my log message line will be as below: [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = getKey() How to specify the delimiter, while describing the table? Thanks, Sangeetha From: alo alt <[email protected]> To: [email protected]; sangeetha k <[email protected]> Sent: Tuesday, December 6, 2011 2:01 PM Subject: Re: log4j format logs in Hive table Hi, I hope I understood your question correct - did you describe your table? Like "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" row* = a name of your descision, Datatype look @documentation. After import via "insert (overwrite) table YOURTABLE" - alex On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k < [email protected] > wrote: Hi, I am new to Hive. I am using Flume agent to collect log4j logs and sending to HDFS. Now i wanted to load the log4j format logs from HDFS to Hive tables. Each of the attributes in log statements like timestamp, level, classname etc... should be loaded in seperate columns in the Hive tables. I tried creating table in Hive and loaded the entire log in one column, but dont know how to load the above mentioned data in seperate columns. Please send me your suggestions, any links, tutorials on this. Thanks, Sangeetha -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.
