Hey guys,
I am using flume to directly sink data into my hive table. However, there
seems to be some schema inconsistency, and I am not sure how to
troubleshoot it.

I created a hive table 'targeting' in hive, it use sequence file, snappy
compression, partitioned by 'epoch'. After the table is created, I could
see a folder called 'targeting' under my folder:
/hive/cwang49.db/targeting

I then using flume to flow my log data into this folder directly, the flume
configuration is:
sinks.HDFS.type = hdfs
sinks.HDFS.hdfs.path = maprfs:///hive/cwang49.db/targeting/epoch=%{epoch}
sinks.HDFS.hdfs.fileType = SequenceFile
sinks.HDFS.hdfs.codeC = snappy

When I run flume node, I can see folder epoch=123445 created, and there are
files under the folder as well. However, when I run hive query against the
table, it returns empty.

I think this might be caused by some schema discrepancy? Do I still need to
load partition meta data into hive before i could see the partition?(I
recall doing this for external table). How can I trouble shoot this?

Thanks a bunch!
Chen

Reply via email to