Looking at the MapFileOutputFormat API, not that I can tell.
On Mon, Sep 9, 2013 at 11:18 AM, Hansen,Chuck <[email protected]>wrote: > Thanks for the quick reply Josh. Is there a way I could use a > PathFilter when creating the MapFile.Reader[] array? > > MapFile.Reader[] readers = MapFileOutputFormat.*getReaders*(*new* Path( > MAPFILE_LOCATION), conf); > > > -- > *Chuck Hansen* > Software Engineer, Record Dev > [email protected] | 816-201-9629 > Cerner Corporation | www.cerner.com > > From: Josh Wills <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Monday, September 9, 2013 12:44 PM > To: "[email protected]" <[email protected]> > Subject: Re: Writing MapFile through Crunch, issue reading through Hadoop > > Tough to assign blame here-- writing a _SUCCESS bit is usually a good > thing, and most Hadoop file formats are smart about filtering out files > that start with "_" or ".", or allowing you to specify an instance of > PathFilter that can be used to ignore hidden files. > > One way around this would be to add an option to Targets that would > disable writing the _SUCCESS flag, which would be part of a more general > change to allow per-Source and per-Target configuration options. For > example, you could specify that some outputs of an MR job were compressed > using gzip, and others were compressed using Snappy, instead of having a > single compression strategy for everything. > > > > On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck <[email protected]>wrote: > >> With Crunch versions prior to 0.7.x, there does not appear to be an >> _SUCCESS file written upon completion, starting with 0.7.x there is. This >> file (and any others not intended to be read through [1]) appears to cause >> issue with [1]. This means writing a MapFile with crunch and reading back >> with [1] works prior to 0.7.x, but starting with 0.7.x, [1] will throw an >> exception. >> >> Is this a bug with Crunch and/or Hadoop? >> >> [1] org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.* >> getReaders* >> * >> * >> Hadoop CDH versions used: >> >> <hadoopCoreVersion>2.0.0-mr1-cdh4.2.1</hadoopCoreVersion> >> >> <hadoop_commonAndHDFSVersion>2.0.0-cdh4.2.1</ >> hadoop_commonAndHDFSVersion> >> >> -- >> *Chuck Hansen* >> Software Engineer, Record Dev >> [email protected] | 816-201-9629 >> Cerner Corporation | www.cerner.com >> CONFIDENTIALITY NOTICE This message and any included attachments are >> from Cerner Corporation and are intended only for the addressee. The >> information contained in this message is confidential and may constitute >> inside or non-public information under international, federal, or state >> securities laws. Unauthorized forwarding, printing, copying, distribution, >> or use of such information is strictly prohibited and may be unlawful. If >> you are not the addressee, please promptly delete this message and notify >> the sender of the delivery error by e-mail or you may call Cerner's >> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >> > > > > -- > Director of Data Science > Cloudera<https://urldefense.proofpoint.com/v1/url?u=http://www.cloudera.com&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=dfec86313eb4ebabcc38a92841d35c7823112f1603aebaa9df31b710826b7497> > Twitter: > @josh_wills<https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/josh_wills&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=eba3521309e58855a5e6f14d055d06c6785d798326806a7e7a48b92632596cae> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
