Crunch Gurus,
Need some advice. I have experience writing Orc files in Crunch, and I can
successfully read them in Crunch and print them out.
But when I attempt to process them with a DoFn, I get this error. What should I
do?
Exception in thread "Thread-5" java.lang.NoSuchFieldError:
HIVE_ORC_SPLIT_STRATEGY
Here’s my code:
logger.info("Generating Hadoop Configuration...");
Configuration crunchConf = getConf();
logger.info("Establishing OrcFile Target for Final Output...");
OrcFileTarget target = new OrcFileTarget(new Path(outputPath));
//Establish Pipeline
logger.info("Generating Crunch Map-Reduce Pipeline...");
Pipeline pipeline = new MRPipeline(DataQualityDriver.class,crunchConf);
//Establish OrcFileSource (emulates a Java class) linked to HDFS Path
logger.info("Generating Orc File Source around given HDFS path...");
OrcFileSource<Verint1978Record> orcsource = new
OrcFileSource<Verint1978Record>(new Path(inputPath),
Orcs.reflects(Verint1978Record.class));
// Ingest the Orc File into a PCollection
logger.info("Generating PCollection of Verint1978Record from Data...");
PCollection<Verint1978Record> data = pipeline.read(orcsource);
//
for (Verint1978Record record : data.materialize()){
System.out.println(record.getAllColumns());
}
//this all works fine until THIS point
// can’t run these files through a DOFN or write them out without
getting above error
//this dofn simply reads the prev PCollection and prints it back out as
a string (just to test the DOFN)
PCollection<String> newData =
data.parallelDo(DataQualityDoFns.DoFn_ProduceSameRecords(),
Writables.strings());
for (String record : newData.materialize()){
System.out.println(record);
}
PipelineResult result = pipeline.done();
DoFN (super lazy):
static DoFn<Verint1978Record, String> DoFn_ProduceSameRecords(){
return new DoFn<Verint1978Record, String>() {
@Override
public void process(Verint1978Record input, Emitter<String> emitter) {
emitter.emit(input.getLct_nbr() + "" + input.getVid_caa_id()+ "" +
input.getHrs_nbr()+ "" + input.getMte_nbr()+ "" + input.getAcl_idc()+ "" +
input.getSec_dur()+ "" + input.getSec_to_pcs()+ "" + input.getSec_pcd()+ "" +
input.getUse_for_rpr_idc()+ "" + input.getGrp_cnt()+ "" + input.getSng_cnt()+
"" + input.getUpd_dt()+ "" + input.getUpd_id()+ "" + input.getCal_dt());
}
};
}
---------------------------------------------------------------------------
[cid:9719F25B-EBED-4C9D-A806-15698A326163]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be
proprietary, confidential, privileged and otherwise protected from improper or
erroneous disclosure. If you are not the sender's intended recipient, you are
not authorized to intercept, read, print, retain, copy, forward, or disseminate
this message. If you have erroneously received this communication, please
notify the sender immediately by phone (704-758-1000) or by e-mail and destroy
all copies of this message electronic, paper, or otherwise.
By transmitting documents via this email: Users, Customers, Suppliers and
Vendors collectively acknowledge and agree the transmittal of information via
email is voluntary, is offered as a convenience, and is not a secured method of
communication; Not to transmit any payment information E.G. credit card, debit
card, checking account, wire transfer information, passwords, or sensitive and
personal information E.G. Driver's license, DOB, social security, or any other
information the user wishes to remain confidential; To transmit only
non-confidential information such as plans, pictures and drawings and to assume
all risk and liability for and indemnify Lowe's from any claims, losses or
damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.