I solved my own problem. For anyone who's curious: It turns out that subclassing an InputFormat allows one to override the listStatus method, which returns the list of files for Hive (or mapreduce in general) to process. All I had to do was subclass org.apache.hadoop.mapred.TextInputFormat and override the listStatus method and voila; I was able to make it ignore directories. Here's the java code that I used:
public class TextFileInputFormatIgnoreSubDir extends TextInputFormat { @Override protected FileStatus[] listStatus (JobConf job) throws IOException { FileStatus[] files = super.listStatus(job); List<FileStatus> newFiles = new ArrayList<FileStatus>(); int len = files.length; for (int i = 0; i < len; ++i) { FileStatus file = files[i]; if (!file.isDir()) { newFiles.add(file); } } files = new FileStatus[newFiles.size()]; for (int i = 0; i < newFiles.size(); ++i) { files[i] = newFiles.get(i); } return files; } } And the HiveQL code I used to define the table: CREATE EXTERNAL TABLE users (id STRING, user_name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.example.mapreduce.input.TextFileInputFormatIgnoreSubDir' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/data/test/users'; Hope this saves someone else the trouble of figuring it out... -Dave On Thu, Aug 18, 2011 at 3:53 PM, Dave <drive...@gmail.com> wrote: > Hi, > > I have a partitioned external table in Hive, and in the partition > directories there are other subdirectories that are not related to the table > itself. Hive seems to want to scan those directories, as I am getting an > error message when trying to do a SELECT on the table: > > Failed with exception java.io.IOException:java.io.IOException: Not a file: > hdfs://path/to/partition/path/to/subdir > > Also, it seems to ignore directories prefixed by an underscore > (_directory). > > I am using hive 0.7.1 on Hadoop 0.20.2. > > Is there a way to force Hive to ignore all subdirectories in external > tables and only look at files? > > Thanks in advance, > -Dave >