Not super-sure myself, but it looks like something the underlying OrcInputFormat expects to be set in Hive. From here, it corresponds to the hive.exec.orc.split.strategy property in the HiveConf: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java
"hive.exec.orc.split.strategy", "HYBRID", new StringSet <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/Validator.java#Validator.StringSet>("HYBRID", "BI", "ETL"), 1014 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1014> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#> "This is not a user level config. BI strategy is used when the requirement is to spend less time in split generation" + 1015 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1015> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#> " as opposed to query execution (split generation does not read or cache file footers)." + 1016 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1016> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#> " ETL strategy is used when spending little more time in split generation is acceptable" + 1017 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1017> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#> " (split generation reads and caches file footers). HYBRID chooses between the above strategies" + 1018 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1018> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#> " based on heuristics."), On Wed, Feb 3, 2016 at 1:59 PM, Robinson, Landon - Landon < [email protected]> wrote: > Crunch Gurus, > > Need some advice. I have experience writing Orc files in Crunch, and I can > successfully read them in Crunch and print them out. > But when I attempt to process them with a DoFn, I get this error. What > should I do? > > Exception in thread "Thread-5" java.lang.NoSuchFieldError: > HIVE_ORC_SPLIT_STRATEGY > > Here’s my code: > > logger.info("Generating Hadoop Configuration..."); > Configuration crunchConf = getConf(); > logger.info("Establishing OrcFile Target for Final Output..."); > OrcFileTarget target = new OrcFileTarget(new Path(outputPath)); > //Establish Pipeline > logger.info("Generating Crunch Map-Reduce Pipeline..."); > Pipeline pipeline = new > MRPipeline(DataQualityDriver.class,crunchConf); > > //Establish OrcFileSource (emulates a Java class) linked to HDFS Path > logger.info("Generating Orc File Source around given HDFS path..."); > > OrcFileSource<Verint1978Record> orcsource = new > OrcFileSource<Verint1978Record>(new Path(inputPath), > Orcs.reflects(Verint1978Record.class)); > > // Ingest the Orc File into a PCollection > logger.info("Generating PCollection of Verint1978Record from > Data..."); > PCollection<Verint1978Record> data = pipeline.read(orcsource); > // > > for (Verint1978Record record : data.materialize()){ > System.out.println(record.getAllColumns()); > } > > //this all works fine until THIS point > > *// can’t run these files through a DOFN or write them out without > getting above error* > > * //this dofn simply reads the prev PCollection and prints it back out as > a string (just to test the DOFN)* > > PCollection<String> newData = > data.parallelDo(DataQualityDoFns.DoFn_ProduceSameRecords(), > Writables.strings()); > for (String record : newData.materialize()){ > System.out.println(record); > } > > PipelineResult result = pipeline.done(); > > > DoFN (super lazy): > > static DoFn<Verint1978Record, String> DoFn_ProduceSameRecords(){ > return new DoFn<Verint1978Record, String>() { > @Override > public void process(Verint1978Record input, Emitter<String> emitter) { > > emitter.emit(input.getLct_nbr() + "" + input.getVid_caa_id()+ "" > + input.getHrs_nbr()+ "" + input.getMte_nbr()+ "" + input.getAcl_idc()+ "" + > input.getSec_dur()+ "" + input.getSec_to_pcs()+ "" + input.getSec_pcd()+ "" + > input.getUse_for_rpr_idc()+ "" + input.getGrp_cnt()+ "" + input.getSng_cnt()+ > "" + input.getUpd_dt()+ "" + input.getUpd_id()+ "" + input.getCal_dt()); > > } > }; > } > > --------------------------------------------------------------------------- > Landon Robinson > Big Data & Hadoop Engineer > IT Business Intelligence, Lowe’s Companies Inc. > --------------------------------------------------------------------------- > NOTICE: All information in and attached to the e-mails below may be > proprietary, confidential, privileged and otherwise protected from improper > or erroneous disclosure. If you are not the sender's intended recipient, > you are not authorized to intercept, read, print, retain, copy, forward, or > disseminate this message. If you have erroneously received this > communication, please notify the sender immediately by phone (704-758-1000) > or by e-mail and destroy all copies of this message electronic, paper, or > otherwise. > > *By transmitting documents via this email: Users, Customers, Suppliers and > Vendors collectively acknowledge and agree the transmittal of information > via email is voluntary, is offered as a convenience, and is not a secured > method of communication; Not to transmit any payment information E.G. > credit card, debit card, checking account, wire transfer information, > passwords, or sensitive and personal information E.G. Driver's license, > DOB, social security, or any other information the user wishes to remain > confidential; To transmit only non-confidential information such as plans, > pictures and drawings and to assume all risk and liability for and > indemnify Lowe's from any claims, losses or damages that may arise from the > transmittal of documents or including non-confidential information in the > body of an email transmittal. Thank you. * >
