Here is the code sketch to get you started: Step 1. Create a builder:
ReadEntity.Builder builder = new ReadEntity.Builder(); String database = ... builder.withDatabase(database); String table = ... builder.withTable(table); String filter = ... if (filter != null) { builder.withFilter(filter); } String region = getString(context.getRegion()); if (region != null) { builder.withRegion(region); } Step 2: Get initial reader context Map<String, String> config = ... // make sure that you have hive.metastore.uris property in the config ReadEntity entity = builder.build(); ReaderContext readerContext = DataTransferFactory.getHCatReader(entity, config).prepareRead(); Step 3: Get input splits and Hadoop Configuration List<InputSplit> splits = readerContext.getSplits(); Configuration config = readerContext.getConfig(); Step 4: Get records a) for each input split get the reader: HCatReader hcatReader = DataTransferFactory.getHCatReader(inputSplit, config); Iterator<HCatRecord> records = hcatReader.read(); b) Iterate over the records for that reader On Mon, Jun 16, 2014 at 9:57 AM, Brian Jeltema < brian.jelt...@digitalenvoy.net> wrote: > regarding: > > 3. To read the HCat records.... > > It depends on how you' like to read the records ... will you be reading > ALL the records remotely from the client app > or you will get input splits and read the records on mappers....??? > > The code will be different (somewhat)... let me know... > > > in this case I’d be reading all of the records remotely from the client app > > TIA > Brian > > On Jun 13, 2014, at 9:51 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote: > > I am not sure about java docs... ;-] > I have spent the last three years integrating with HCat and to make it > work had to go thru the code... > > So here are some samples that can be helpful to start with. If you are > using Hive 0.12.0 I would not bother with the new APIs... I had to create > some shim classes for HCat to make my code version independent but I cannot > share that. > > So > > 1. To enumerate tables ... just use Hive client ... this seems to be > version independent > > hiveMetastoreClient = new HiveMetaStoreClient(conf); > > // the conf should contain the "hive.metastore.uris" property that point > to your Hive Metastore thrift server > List<String> databases = hiveMetastoreClient.getAllDatabases(); > // this will get you all the databases > List<String> tables = hiveMetastoreClient.getAllTables(database); > // this will get you all the tables for the give data base > > 2. To get the table schema... I assume that you are after HCat schema > > > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.mapreduce.InputSplit; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; > import org.apache.hcatalog.data.schema.HCatSchemaUtils; > import org.apache.hcatalog.mapreduce.HCatInputFormat; > import org.apache.hcatalog.mapreduce.HCatSplit; > import org.apache.hcatalog.mapreduce.InputJobInfo; > > > Job job = new Job(config); > job.setJarByClass(XXXXXX.class); // this will be your class > job.setInputFormatClass(HCatInputFormat.class); > job.setOutputFormatClass(TextOutputFormat.class); > InputJobInfo inputJobInfo = InputJobInfo.create("my_data_base", > "my_table", "partition filter"); > HCatInputFormat.setInput(job, inputJobInfo); > HCatSchema s = HCatInputFormat.getTableSchema(job); > > > 3. To read the HCat records.... > > It depends on how you' like to read the records ... will you be reading > ALL the records remotely from the client app > or you will get input splits and read the records on mappers....??? > > The code will be different (somewhat)... let me know... > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema < > brian.jelt...@digitalenvoy.net> wrote: > >> Version 0.12.0. >> >> I’d like to obtain the table’s schema, scan a table partition, and use >> the schema to parse the rows. >> >> I can probably figure this out by looking at the HCatalog source. My >> concern was that >> the HCatalog packages in the Hive distributions are excluded in the >> JavaDoc, which implies >> that the API is not public. Is there a reason for this? >> >> Brian >> >> On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote: >> >> You should be able to access this information. The exact API depends on >> the version of Hive/HCat. As you know earlier HCat API is being deprecated >> and will be removed in Hive 0.14.0. I can provide you with the code sample >> if you tell me what you are trying to do and what version of Hive you are >> using. >> >> >> On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema < >> brian.jelt...@digitalenvoy.net> wrote: >> >>> I’m experimenting with HCatalog, and would like to be able to access >>> tables and their schema >>> from a Java application (not Hive/Pig/MapReduce). However, the API seems >>> to be hidden, which >>> leads leads me to believe that this is not a supported use case. Is >>> HCatalog use limited to >>> one of the supported frameworks? >>> >>> TIA >>> >>> Brian >> >> >> >> > >