Not super-sure myself, but it looks like something the underlying
OrcInputFormat expects to be set in Hive. From here, it corresponds to the
hive.exec.orc.split.strategy property in the HiveConf:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java

"hive.exec.orc.split.strategy", "HYBRID", new StringSet
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/Validator.java#Validator.StringSet>("HYBRID",
"BI", "ETL"),

1014 
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1014>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#>

        "This is not a user level config. BI strategy is used when the
requirement is to spend less time in split generation" +

1015 
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1015>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#>

        " as opposed to query execution (split generation does not
read or cache file footers)." +

1016 
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1016>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#>

        " ETL strategy is used when spending little more time in split
generation is acceptable" +

1017 
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1017>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#>

        " (split generation reads and caches file footers). HYBRID
chooses between the above strategies" +

1018 
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#1018>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/1.2.0/org/apache/hadoop/hive/conf/HiveConf.java#>

        " based on heuristics."),


On Wed, Feb 3, 2016 at 1:59 PM, Robinson, Landon - Landon <
[email protected]> wrote:

> Crunch Gurus,
>
> Need some advice. I have experience writing Orc files in Crunch, and I can
> successfully read them in Crunch and print them out.
> But when I attempt to process them with a DoFn, I get this error. What
> should I do?
>
> Exception in thread "Thread-5" java.lang.NoSuchFieldError:
> HIVE_ORC_SPLIT_STRATEGY
>
> Here’s my code:
>
>         logger.info("Generating Hadoop Configuration...");
>         Configuration crunchConf = getConf();
>         logger.info("Establishing OrcFile Target for Final Output...");
>         OrcFileTarget target = new OrcFileTarget(new Path(outputPath));
>         //Establish Pipeline
>         logger.info("Generating Crunch Map-Reduce Pipeline...");
>         Pipeline pipeline = new 
> MRPipeline(DataQualityDriver.class,crunchConf);
>
>         //Establish OrcFileSource (emulates a Java class) linked to HDFS Path
>         logger.info("Generating Orc File Source around given HDFS path...");
>
>         OrcFileSource<Verint1978Record> orcsource = new 
> OrcFileSource<Verint1978Record>(new Path(inputPath), 
> Orcs.reflects(Verint1978Record.class));
>
> //        Ingest the Orc File into a PCollection
>         logger.info("Generating PCollection of Verint1978Record from 
> Data...");
>         PCollection<Verint1978Record> data = pipeline.read(orcsource);
> //
>
>       for (Verint1978Record record : data.materialize()){
>               System.out.println(record.getAllColumns());
>       }
>
> //this all works fine until THIS point
>
>       *// can’t run these files through a DOFN or write them out without 
> getting above error*
>
> *     //this dofn simply reads the prev PCollection and prints it back out as 
> a string (just to test the DOFN)*
>
>         PCollection<String> newData = 
> data.parallelDo(DataQualityDoFns.DoFn_ProduceSameRecords(), 
> Writables.strings());
>                 for (String record : newData.materialize()){
>             System.out.println(record);
>         }
>
> PipelineResult result = pipeline.done();
>
>
> DoFN (super lazy):
>
> static DoFn<Verint1978Record, String> DoFn_ProduceSameRecords(){
>     return new DoFn<Verint1978Record, String>() {
>         @Override
>         public void process(Verint1978Record input, Emitter<String> emitter) {
>
>             emitter.emit(input.getLct_nbr() + "" + input.getVid_caa_id()+ "" 
> + input.getHrs_nbr()+ "" + input.getMte_nbr()+ "" + input.getAcl_idc()+ "" + 
> input.getSec_dur()+ "" + input.getSec_to_pcs()+ "" + input.getSec_pcd()+ "" + 
> input.getUse_for_rpr_idc()+ "" + input.getGrp_cnt()+ "" + input.getSng_cnt()+ 
> "" + input.getUpd_dt()+ "" + input.getUpd_id()+ "" + input.getCal_dt());
>
>         }
>     };
> }
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Reply via email to