Re: HCatalog access from a Java app

Dmitry Vasilenko Mon, 16 Jun 2014 09:21:14 -0700

Here is the code sketch to get you started:

Step 1. Create a builder:


ReadEntity.Builder builder = new ReadEntity.Builder();
String database = ...
builder.withDatabase(database);
String table = ...
builder.withTable(table);
String filter = ...
if (filter != null) {
builder.withFilter(filter);
}
String region = getString(context.getRegion());
if (region != null) {
builder.withRegion(region);
}


Step 2: Get initial reader context

Map<String, String> config = ...
// make sure that you have hive.metastore.uris property in the config
ReadEntity entity = builder.build();
ReaderContext readerContext = DataTransferFactory.getHCatReader(entity,
config).prepareRead();

Step 3: Get input splits and Hadoop Configuration

List<InputSplit> splits = readerContext.getSplits();
Configuration config = readerContext.getConfig();

Step 4: Get records

a) for each input split get the reader:

HCatReader hcatReader = DataTransferFactory.getHCatReader(inputSplit,
config);

Iterator<HCatRecord> records = hcatReader.read();

b) Iterate over the records for that reader





On Mon, Jun 16, 2014 at 9:57 AM, Brian Jeltema <
brian.jelt...@digitalenvoy.net> wrote:

> regarding:
>
> 3. To read the HCat records....
>
> It depends on how you' like to read the records  ... will you be reading
> ALL the records remotely from the client app
> or you will get input splits and read the records on mappers....???
>
> The code will be different (somewhat)... let me know...
>
>
> in this case I’d be reading all of the records remotely from the client app
>
> TIA
> Brian
>
> On Jun 13, 2014, at 9:51 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote:
>
> I am not sure about java docs... ;-]
> I have spent the last three years integrating with HCat and to make it
> work had to go thru the code...
>
> So here are some samples that can be helpful to start with. If you are
> using Hive 0.12.0 I would not bother with the new APIs... I had to create
> some shim classes for HCat to make my code version independent but I cannot
> share that.
>
> So
>
> 1. To enumerate tables ... just use Hive client ... this seems to be
> version independent
>
>    hiveMetastoreClient = new HiveMetaStoreClient(conf);
>
> // the conf should contain the "hive.metastore.uris" property that point
> to your Hive Metastore thrift server
>    List<String> databases = hiveMetastoreClient.getAllDatabases();
> // this will get you all the databases
>    List<String> tables = hiveMetastoreClient.getAllTables(database);
> // this will get you all the tables for the give data base
>
> 2. To get the table schema... I assume that you are after HCat schema
>
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.mapreduce.InputSplit;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
> import org.apache.hcatalog.data.schema.HCatSchemaUtils;
> import org.apache.hcatalog.mapreduce.HCatInputFormat;
> import org.apache.hcatalog.mapreduce.HCatSplit;
> import org.apache.hcatalog.mapreduce.InputJobInfo;
>
>
>   Job job = new Job(config);
>   job.setJarByClass(XXXXXX.class); // this will be your class
> job.setInputFormatClass(HCatInputFormat.class);
> job.setOutputFormatClass(TextOutputFormat.class);
>   InputJobInfo inputJobInfo = InputJobInfo.create("my_data_base",
> "my_table", "partition filter");
> HCatInputFormat.setInput(job, inputJobInfo);
> HCatSchema s =  HCatInputFormat.getTableSchema(job);
>
>
> 3. To read the HCat records....
>
> It depends on how you' like to read the records  ... will you be reading
> ALL the records remotely from the client app
> or you will get input splits and read the records on mappers....???
>
> The code will be different (somewhat)... let me know...
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema <
> brian.jelt...@digitalenvoy.net> wrote:
>
>> Version 0.12.0.
>>
>> I’d like to obtain the table’s schema, scan a table partition, and use
>> the schema to parse the rows.
>>
>> I can probably figure this out by looking at the HCatalog source. My
>> concern was that
>> the HCatalog packages in the Hive distributions are excluded in the
>> JavaDoc, which implies
>> that the API is not public. Is there a reason for this?
>>
>> Brian
>>
>> On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote:
>>
>> You should be able to access this information. The exact API depends on
>> the version of Hive/HCat. As you know earlier HCat API is being deprecated
>> and will be removed in Hive 0.14.0. I can provide you with the code sample
>> if you tell me what you are trying to do and what version of Hive you are
>> using.
>>
>>
>> On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema <
>> brian.jelt...@digitalenvoy.net> wrote:
>>
>>> I’m experimenting with HCatalog, and would like to be able to access
>>> tables and their schema
>>> from a Java application (not Hive/Pig/MapReduce). However, the API seems
>>> to be hidden, which
>>> leads leads me to believe that this is not a supported use case. Is
>>> HCatalog use limited to
>>> one of the supported frameworks?
>>>
>>> TIA
>>>
>>> Brian
>>
>>
>>
>>
>
>

Re: HCatalog access from a Java app

Reply via email to