Re: Writing MapFile through Crunch, issue reading through Hadoop

Josh Wills Mon, 09 Sep 2013 11:21:07 -0700

Looking at the MapFileOutputFormat API, not that I can tell.


On Mon, Sep 9, 2013 at 11:18 AM, Hansen,Chuck <[email protected]>wrote:

>   Thanks for the quick reply Josh.  Is there a way I could use a
> PathFilter when creating the MapFile.Reader[] array?
>
>  MapFile.Reader[] readers = MapFileOutputFormat.*getReaders*(*new* Path(
> MAPFILE_LOCATION), conf);
>
>
>   --
>  *Chuck Hansen*
> Software Engineer, Record Dev
> [email protected] | 816-201-9629
> Cerner Corporation | www.cerner.com
>
>   From: Josh Wills <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Monday, September 9, 2013 12:44 PM
> To: "[email protected]" <[email protected]>
> Subject: Re: Writing MapFile through Crunch, issue reading through Hadoop
>
>   Tough to assign blame here-- writing a _SUCCESS bit is usually a good
> thing, and most Hadoop file formats are smart about filtering out files
> that start with "_" or ".", or allowing you to specify an instance of
> PathFilter that can be used to ignore hidden files.
>
>  One way around this would be to add an option to Targets that would
> disable writing the _SUCCESS flag, which would be part of a more general
> change to allow per-Source and per-Target configuration options. For
> example, you could specify that some outputs of an MR job were compressed
> using gzip, and others were compressed using Snappy, instead of having a
> single compression strategy for everything.
>
>
>
> On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck <[email protected]>wrote:
>
>>   With Crunch versions prior to 0.7.x, there does not appear to be an
>> _SUCCESS file written upon completion, starting with 0.7.x there is.  This
>> file (and any others not intended to be read through [1]) appears to cause
>> issue with [1].  This means writing a MapFile with crunch and reading back
>> with [1] works prior to 0.7.x, but starting with 0.7.x, [1] will throw an
>> exception.
>>
>>  Is this a bug with Crunch and/or Hadoop?
>>
>>  [1] org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.*
>> getReaders*
>> *
>> *
>> Hadoop CDH versions used:
>>
>>     <hadoopCoreVersion>2.0.0-mr1-cdh4.2.1</hadoopCoreVersion>
>>
>>     <hadoop_commonAndHDFSVersion>2.0.0-cdh4.2.1</
>> hadoop_commonAndHDFSVersion>
>>
>>  --
>>  *Chuck Hansen*
>> Software Engineer, Record Dev
>> [email protected] | 816-201-9629
>> Cerner Corporation | www.cerner.com
>>    CONFIDENTIALITY NOTICE This message and any included attachments are
>> from Cerner Corporation and are intended only for the addressee. The
>> information contained in this message is confidential and may constitute
>> inside or non-public information under international, federal, or state
>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>> or use of such information is strictly prohibited and may be unlawful. If
>> you are not the addressee, please promptly delete this message and notify
>> the sender of the delivery error by e-mail or you may call Cerner's
>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>
>
>
>
>  --
> Director of Data Science
> Cloudera<https://urldefense.proofpoint.com/v1/url?u=http://www.cloudera.com&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=dfec86313eb4ebabcc38a92841d35c7823112f1603aebaa9df31b710826b7497>
> Twitter: 
> @josh_wills<https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/josh_wills&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=JXOdMxOtz0SKzVjryQ7bBjFj8ORhhsYLquXKI57NWcM%3D%0A&m=fZPqgxqoN97GEgR5GUep5JVqbq3JiW6v9%2B%2FwFSp3ELM%3D%0A&s=eba3521309e58855a5e6f14d055d06c6785d798326806a7e7a48b92632596cae>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Writing MapFile through Crunch, issue reading through Hadoop

Reply via email to