I am trying to ingest unstructured data into Hive so it can be queried. I am trying to follow the steps in Tutorial Exercise 3, I am having some problems. The created tables has no data in it. Here is a sample of the unstructured data:
560)211-5250 437)810-5830 04:35 21 May 2014 17:26:39 356)539-2237 889)650-7326 30:29 26 Feb 2014 11:56:08 the data is tab-delimited. Here are the steps I am following: 1. a. make destination folder sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecords b. copy data into destination folder sudo -u hdfs hadoop fs -copyFromLocal ~/Desktop/CDRecords.txt /user/cloudera/vector/callRecords/ 2. create Hive tables using the command line: CREATE EXTERNAL TABLE intermediate_call_records ( callFrom STRING, callTo STRING, callDuration STRING, date STRING, timeOfCall STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" ) LOCATION '/user/cloudera/vector/callRecords'; David Novogrodsky [email protected] http://www.linkedin.com/in/davidnovogrodsky
