Hi Sangeetha,
Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to 
HDFS. You have many options for choosing the SerDe for your table.
For example, if your file contains tab delimited fields, you could use the 
default SerDe (by not specifying any SerDe) and specify the delimiter by using
FIELDS TERMINATED BY '\t'
in your create table statement.

If you desire,  you could use the Regex SerDe (albeit, with some performance 
overhead) using something like:

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES  (  
"input.regex" = ".*time:([^,]*)",  
"output.format.string" = "time:%1$s")

in your create table statement.

As you get more familiar with Hive, you might find the need for writing your 
own UDF for parsing the data.

Here is the link to the Hive wiki for Create Table:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropTable

Here is the link for UDFs:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF


Welcome and good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: [email protected] 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "sangeetha k" <[email protected]>
To: [email protected]
Sent: Tuesday, December 6, 2011 4:26:03 AM
Subject: Re: log4j format logs in Hive table



Hi, 

Thanks for the response. 
Yes, You got my question. 

An example of my log message line will be as below: 

[2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] 
[net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] 
[Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 
550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: 
server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = 
getKey() 

How to specify the delimiter, while describing the table? 

Thanks, 
Sangeetha 




From: alo alt <[email protected]> 
To: [email protected]; sangeetha k <[email protected]> 
Sent: Tuesday, December 6, 2011 2:01 PM 
Subject: Re: log4j format logs in Hive table 


Hi, 


I hope I understood your question correct - did you describe your table? Like 
"create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" 


row* = a name of your descision, Datatype look @documentation. 


After import via "insert (overwrite) table YOURTABLE" 


- alex 




On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k < [email protected] > wrote: 





Hi, 

I am new to Hive. 

I am using Flume agent to collect log4j logs and sending to HDFS. 
Now i wanted to load the log4j format logs from HDFS to Hive tables. 
Each of the attributes in log statements like timestamp, level, classname 
etc... should be loaded in seperate columns in the Hive tables. 

I tried creating table in Hive and loaded the entire log in one column, but 
dont know how to load the above mentioned data in seperate columns. 

Please send me your suggestions, any links, tutorials on this. 

Thanks, 
Sangeetha 



-- 

Alexander Lorenz 
http://mapredit.blogspot.com 


P Think of the environment: please don't print this email unless you really 
need to. 




Reply via email to