Re: Writing MapFile through Crunch, issue reading through Hadoop

Hansen,Chuck Mon, 09 Sep 2013 11:19:03 -0700

Thanks for the quick reply Josh.  Is there a way I could use a PathFilter when 
creating the MapFile.Reader[] array?



MapFile.Reader[] readers = MapFileOutputFormat.getReaders(new 
Path(MAPFILE_LOCATION), conf);


--
Chuck Hansen
Software Engineer, Record Dev
[email protected]<mailto:[email protected]> | 816-201-9629
Cerner Corporation | www.cerner.com<http://www.cerner.com/>

From: Josh Wills <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, September 9, 2013 12:44 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Writing MapFile through Crunch, issue reading through Hadoop

Tough to assign blame here-- writing a _SUCCESS bit is usually a good thing, 
and most Hadoop file formats are smart about filtering out files that start 
with "_" or ".", or allowing you to specify an instance of PathFilter that can 
be used to ignore hidden files.

One way around this would be to add an option to Targets that would disable 
writing the _SUCCESS flag, which would be part of a more general change to 
allow per-Source and per-Target configuration options. For example, you could 
specify that some outputs of an MR job were compressed using gzip, and others 
were compressed using Snappy, instead of having a single compression strategy 
for everything.



On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck 
<[email protected]<mailto:[email protected]>> wrote:
With Crunch versions prior to 0.7.x, there does not appear to be an _SUCCESS 
file written upon completion, starting with 0.7.x there is.  This file (and any 
others not intended to be read through [1]) appears to cause issue with [1].  
This means writing a MapFile with crunch and reading back with [1] works prior 
to 0.7.x, but starting with 0.7.x, [1] will throw an exception.

Is this a bug with Crunch and/or Hadoop?

[1] org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.getReaders

Hadoop CDH versions used:

    <hadoopCoreVersion>2.0.0-mr1-cdh4.2.1</hadoopCoreVersion>

    <hadoop_commonAndHDFSVersion>2.0.0-cdh4.2.1</hadoop_commonAndHDFSVersion>

--
Chuck Hansen
Software Engineer, Record Dev
[email protected]<mailto:[email protected]> | 
816-201-9629<tel:816-201-9629>
Cerner Corporation | www.cerner.com<http://www.cerner.com/>
CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024<tel:%28%2B1%29%20%28816%29221-1024>.



--
Director of Data Science
Cloudera<https://urldefense.proofpoint.com/v1/url?u=http://www.cloudera.com&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=dfec86313eb4ebabcc38a92841d35c7823112f1603aebaa9df31b710826b7497>
Twitter: 
@josh_wills<https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/josh_wills&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=eba3521309e58855a5e6f14d055d06c6785d798326806a7e7a48b92632596cae>

Re: Writing MapFile through Crunch, issue reading through Hadoop

Reply via email to