Thanks for the quick reply Josh. Is there a way I could use a PathFilter when creating the MapFile.Reader[] array?
MapFile.Reader[] readers = MapFileOutputFormat.getReaders(new Path(MAPFILE_LOCATION), conf); -- Chuck Hansen Software Engineer, Record Dev [email protected]<mailto:[email protected]> | 816-201-9629 Cerner Corporation | www.cerner.com<http://www.cerner.com/> From: Josh Wills <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Monday, September 9, 2013 12:44 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Writing MapFile through Crunch, issue reading through Hadoop Tough to assign blame here-- writing a _SUCCESS bit is usually a good thing, and most Hadoop file formats are smart about filtering out files that start with "_" or ".", or allowing you to specify an instance of PathFilter that can be used to ignore hidden files. One way around this would be to add an option to Targets that would disable writing the _SUCCESS flag, which would be part of a more general change to allow per-Source and per-Target configuration options. For example, you could specify that some outputs of an MR job were compressed using gzip, and others were compressed using Snappy, instead of having a single compression strategy for everything. On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck <[email protected]<mailto:[email protected]>> wrote: With Crunch versions prior to 0.7.x, there does not appear to be an _SUCCESS file written upon completion, starting with 0.7.x there is. This file (and any others not intended to be read through [1]) appears to cause issue with [1]. This means writing a MapFile with crunch and reading back with [1] works prior to 0.7.x, but starting with 0.7.x, [1] will throw an exception. Is this a bug with Crunch and/or Hadoop? [1] org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.getReaders Hadoop CDH versions used: <hadoopCoreVersion>2.0.0-mr1-cdh4.2.1</hadoopCoreVersion> <hadoop_commonAndHDFSVersion>2.0.0-cdh4.2.1</hadoop_commonAndHDFSVersion> -- Chuck Hansen Software Engineer, Record Dev [email protected]<mailto:[email protected]> | 816-201-9629<tel:816-201-9629> Cerner Corporation | www.cerner.com<http://www.cerner.com/> CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024<tel:%28%2B1%29%20%28816%29221-1024>. -- Director of Data Science Cloudera<https://urldefense.proofpoint.com/v1/url?u=http://www.cloudera.com&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=dfec86313eb4ebabcc38a92841d35c7823112f1603aebaa9df31b710826b7497> Twitter: @josh_wills<https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/josh_wills&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=eba3521309e58855a5e6f14d055d06c6785d798326806a7e7a48b92632596cae>
