I am sure there is a way to do it using the HS2 thrift APIs, but I've never done it myself.
On Fri, Jan 29, 2016 at 10:16 AM, Robinson, Landon - Landon < [email protected]> wrote: > On this same note, I still have a similar problem to solve. > I can point Crunch at an HDFS location and it will ingest/read the Orc > file just fine. > > But is there a way (maybe levering Hcat/Hive apis) to get the file > locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its > partitions, and it tell me the file location on HDFS (which I can then pass > to Crunch to consume the file into the pipeline)? > --------------------------------------------------------------------------- > Landon Robinson > Big Data & Hadoop Engineer > IT Business Intelligence, Lowe’s Companies Inc. > --------------------------------------------------------------------------- > > From: <Robinson>, LCI <[email protected]> > Date: Friday, January 29, 2016 at 10:41 AM > To: LCI <[email protected]>, Apache Crunch Mailing List < > [email protected]>, David Ortiz <[email protected]> > > Subject: Re: Reading Hive Tables into PCollection > > *Solved:* > > Turns out you can use this: > > private HiveChar acl_idc; > > That comes from this package: org.apache.hadoop.hive.common.type.HiveChar; > > Sorry for all the emails, but hope the findings help someone else! > > --------------------------------------------------------------------------- > Landon Robinson > Big Data & Hadoop Engineer > IT Business Intelligence, Lowe’s Companies Inc. > --------------------------------------------------------------------------- > > From: <Robinson>, LCI <[email protected]> > Date: Friday, January 29, 2016 at 10:36 AM > To: Apache Crunch Mailing List <[email protected]>, LCI < > [email protected]>, David Ortiz <[email protected]> > Subject: Re: Reading Hive Tables into PCollection > > Additionally, we tried allowing those characters to be strings, but get > the below error. The real issue is getting the Orc ‘char’ to cast to > something we can use in the Orc structure. > > Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error > while reading local file: file:/tmp/crunch-test/000000_0 > at > org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110) > at > org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99) > at com.google.common.collect.Iterators$5.next(Iterators.java:607) > at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266) > at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223) > at > org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79) > at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165) > at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156) > at > com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > *Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to > org.apache.hadoop.io.Text* > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26) > at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169) > at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222) > at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190) > at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168) > at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63) > at > org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108) > ... 15 more > > *Verint1978Record* > > public class Verint1978Record { > > private String lct_nbr; > private String vid_caa_id; > private Integer hrs_nbr; > private Integer mte_nbr; > private String acl_idc; > private Integer sec_dur; > private Integer sec_to_pcs; > private Integer sec_pcd; > private String use_for_rpr_idc; > private Integer grp_cnt; > private Integer sng_cnt; > private String upd_dt; > private String upd_id; > private String cal_dt; > > } > > > > --------------------------------------------------------------------------- > Landon Robinson > Big Data & Hadoop Engineer > IT Business Intelligence, Lowe’s Companies Inc. > --------------------------------------------------------------------------- > > From: <Robinson>, LCI <[email protected]> > Reply-To: Apache Crunch Mailing List <[email protected]> > Date: Friday, January 29, 2016 at 10:33 AM > To: David Ortiz <[email protected]>, Apache Crunch Mailing List < > [email protected]> > Subject: Re: Reading Hive Tables into PCollection > > Right, we’ve been trying this with little luck — largely because I get the > error: > > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to > org.apache.hadoop.hive.ql.io.orc.OrcStruct > > *Code:* > > OrcFileSource<Verint1978Record> source = new > OrcFileSource<Verint1978Record>(new Path(inputPath), > Orcs.reflects(Verint1978Record.class)); > PCollection<Verint1978Record> persons = pipeline.read(source); > > *Verint1978Record* > > public class Verint1978Record { > > private String lct_nbr; > private String vid_caa_id; > private Integer hrs_nbr; > private Integer mte_nbr; > private Character acl_idc; > private Integer sec_dur; > private Integer sec_to_pcs; > private Integer sec_pcd; > private Character use_for_rpr_idc; > private Integer grp_cnt; > private Integer sng_cnt; > private String upd_dt; > private String upd_id; > private String cal_dt; > > } > > --------------------------------------------------------------------------- > Landon Robinson > Big Data & Hadoop Engineer > IT Business Intelligence, Lowe’s Companies Inc. > --------------------------------------------------------------------------- > > From: David Ortiz <[email protected]> > Date: Friday, January 29, 2016 at 10:19 AM > To: LCI <[email protected]>, Apache Crunch Mailing List < > [email protected]> > Subject: Re: Reading Hive Tables into PCollection > > http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/ > > Here's the java excerpt from that article to read into Avro class (I'm > assuming). > > [code language=”Java”] > // Read an ORCFile using reflection-based serialization (slowest): > OrcFileSource<Person> source = new OrcFileSource<Person>(new > Path(inputPath), \ > Orcs.reflection(Person.class)); > PCollection<Person> persons = pipeline.read(source); > > On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon < > [email protected]> wrote: > >> Orc format. >> >> --------------------------------------------------------------------------- >> Landon Robinson >> Big Data & Hadoop Engineer >> IT Business Intelligence, Lowe’s Companies Inc. >> >> --------------------------------------------------------------------------- >> >> From: David Ortiz <[email protected]> >> Reply-To: Apache Crunch Mailing List <[email protected]> >> Date: Thursday, January 28, 2016 at 1:22 PM >> To: Apache Crunch Mailing List <[email protected]> >> Subject: Re: Reading Hive Tables into PCollection >> >> What format are they stored as? >> >> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon < >> [email protected]> wrote: >> >>> Crunch Gurus, >>> >>> What is the Crunch-convenient or recommended way to read the contents of >>> a Hive table into a Pcollection? >>> Thanks! >>> Best, >>> Landon >>> >>> --------------------------------------------------------------------------- >>> Landon Robinson >>> Big Data & Hadoop Engineer >>> >>> --------------------------------------------------------------------------- >>> NOTICE: All information in and attached to the e-mails below may be >>> proprietary, confidential, privileged and otherwise protected from improper >>> or erroneous disclosure. If you are not the sender's intended recipient, >>> you are not authorized to intercept, read, print, retain, copy, forward, or >>> disseminate this message. If you have erroneously received this >>> communication, please notify the sender immediately by phone >>> (704-758-1000) or by e-mail and destroy all copies of this message >>> electronic, paper, or otherwise. >>> >>> *By transmitting documents via this email: Users, Customers, Suppliers >>> and Vendors collectively acknowledge and agree the transmittal of >>> information via email is voluntary, is offered as a convenience, and is not >>> a secured method of communication; Not to transmit any payment information >>> E.G. credit card, debit card, checking account, wire transfer information, >>> passwords, or sensitive and personal information E.G. Driver's license, >>> DOB, social security, or any other information the user wishes to remain >>> confidential; To transmit only non-confidential information such as plans, >>> pictures and drawings and to assume all risk and liability for and >>> indemnify Lowe's from any claims, losses or damages that may arise from the >>> transmittal of documents or including non-confidential information in the >>> body of an email transmittal. Thank you. * >>> >> NOTICE: All information in and attached to the e-mails below may be >> proprietary, confidential, privileged and otherwise protected from improper >> or erroneous disclosure. If you are not the sender's intended recipient, >> you are not authorized to intercept, read, print, retain, copy, forward, or >> disseminate this message. If you have erroneously received this >> communication, please notify the sender immediately by phone >> (704-758-1000) or by e-mail and destroy all copies of this message >> electronic, paper, or otherwise. >> >> *By transmitting documents via this email: Users, Customers, Suppliers >> and Vendors collectively acknowledge and agree the transmittal of >> information via email is voluntary, is offered as a convenience, and is not >> a secured method of communication; Not to transmit any payment information >> E.G. credit card, debit card, checking account, wire transfer information, >> passwords, or sensitive and personal information E.G. Driver's license, >> DOB, social security, or any other information the user wishes to remain >> confidential; To transmit only non-confidential information such as plans, >> pictures and drawings and to assume all risk and liability for and >> indemnify Lowe's from any claims, losses or damages that may arise from the >> transmittal of documents or including non-confidential information in the >> body of an email transmittal. Thank you. * >> > NOTICE: All information in and attached to the e-mails below may be > proprietary, confidential, privileged and otherwise protected from improper > or erroneous disclosure. If you are not the sender's intended recipient, > you are not authorized to intercept, read, print, retain, copy, forward, or > disseminate this message. If you have erroneously received this > communication, please notify the sender immediately by phone (704-758-1000) > or by e-mail and destroy all copies of this message electronic, paper, or > otherwise. > > *By transmitting documents via this email: Users, Customers, Suppliers and > Vendors collectively acknowledge and agree the transmittal of information > via email is voluntary, is offered as a convenience, and is not a secured > method of communication; Not to transmit any payment information E.G. > credit card, debit card, checking account, wire transfer information, > passwords, or sensitive and personal information E.G. Driver's license, > DOB, social security, or any other information the user wishes to remain > confidential; To transmit only non-confidential information such as plans, > pictures and drawings and to assume all risk and liability for and > indemnify Lowe's from any claims, losses or damages that may arise from the > transmittal of documents or including non-confidential information in the > body of an email transmittal. Thank you. * > NOTICE: All information in and attached to the e-mails below may be > proprietary, confidential, privileged and otherwise protected from improper > or erroneous disclosure. If you are not the sender's intended recipient, > you are not authorized to intercept, read, print, retain, copy, forward, or > disseminate this message. If you have erroneously received this > communication, please notify the sender immediately by phone (704-758-1000) > or by e-mail and destroy all copies of this message electronic, paper, or > otherwise. > > *By transmitting documents via this email: Users, Customers, Suppliers and > Vendors collectively acknowledge and agree the transmittal of information > via email is voluntary, is offered as a convenience, and is not a secured > method of communication; Not to transmit any payment information E.G. > credit card, debit card, checking account, wire transfer information, > passwords, or sensitive and personal information E.G. Driver's license, > DOB, social security, or any other information the user wishes to remain > confidential; To transmit only non-confidential information such as plans, > pictures and drawings and to assume all risk and liability for and > indemnify Lowe's from any claims, losses or damages that may arise from the > transmittal of documents or including non-confidential information in the > body of an email transmittal. Thank you. * >
