Your table definition looks fine, and no you shouldn't service the
recIdField in the table itself.
Without seeing your writing code it's hard to know why you're hitting
this, but some info that may be of use. Hive itself uses a pseudo
column to store the recIdInfo when it reads an ACID row so that it has
it when it writes back for an update or delete. I'm guessing you don't
have this pseudo column set up correctly. You can take a look at
FileSinkOperator (look for ACID or UPDATE) and
OrcInputFormat.getRecordReader to get an idea of how this works.
Alan.
Elliot West <mailto:tea...@gmail.com>
March 20, 2015 at 14:50
Hi,
I'm trying to use the insert, update and delete methods on
OrcRecordUpdater to programmatically mutate an ORC based Hive table
(1.0.0). I've got inserts working correctly but I'm hitting into a
problem with deletes and updates. I get an NPE which I have traced
back to what seems like a missing recIdField(?).
I've tried specifying a location for the field using
AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an
ObjectInspector mismatch. I'm not sure if I should be creating this
field as part of my table definition or not. Currently I'm
constructing the table with some code based on that located in the
storm-hive project:
Table tbl = new Table();
tbl.setDbName(databaseName);
tbl.setTableName(tableName);
tbl.setTableType(TableType.MANAGED_TABLE.toString());
StorageDescriptor sd = new StorageDescriptor();
sd.setCols(getTableColumns(colNames, colTypes));
sd.setNumBuckets(1);
sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
if (partNames != null && partNames.length != 0) {
tbl.setPartitionKeys(getPartitionKeys(partNames));
}
tbl.setSd(sd);
sd.setBucketCols(new ArrayList<String>(2));
sd.setSerdeInfo(new SerDeInfo());
sd.getSerdeInfo().setName(tbl.getTableName());
sd.getSerdeInfo().setParameters(new HashMap<String, String>());
sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT,
"1");
// Not sure if this does anything?
sd.getSerdeInfo().getParameters().put("transactional",
Boolean.TRUE.toString());
sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
sd.setInputFormat(OrcInputFormat.class.getName());
sd.setOutputFormat(OrcOutputFormat.class.getName());
Map<String, String> tableParams = new HashMap<String, String>();
// Not sure if this does anything?
tableParams.put("transactional", Boolean.TRUE.toString());
tbl.setParameters(tableParams);
client.createTable(tbl);
try {
if (partVals != null && partVals.size() > 0) {
addPartition(client, tbl, partVals);
}
} catch (AlreadyExistsException e) {
}
I don't really know enough about Hive and ORCFile internals to work
out where I'm going wrong so any help would be appreciated.
Thanks - Elliot.