Re: Hadoop 101

Chris Embree Wed, 12 Dec 2012 21:11:43 -0800

Just to be a picker of nits... this topic is more concisely Hadoop
Development 101.  I only mention this because I am a newbie hadoop admin
and this was over my head. ;)  Admins don't worry as much about Key Value
Pairs and parsing as we do about where is the script that starts the
NameNode. ;)



On Wed, Dec 12, 2012 at 11:16 PM, David Parks <[email protected]>wrote:

> Nothing that I'm aware of for text files, I'd just use standard unix utils
> to process it outside of Hadoop.
>
> As to getting a reader from any of the Input Formats, here's the typical
> example you'd follow to get the reader for a sequence file, you could
> extrapolate the example to access whichever reader you're interested in.
>
>
> http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/file-bas
> ed-data-structures/id3555432
>
>
> -----Original Message-----
> From: Pat Ferrel [mailto:[email protected]]
> Sent: Wednesday, December 12, 2012 11:37 PM
> To: [email protected]
> Subject: Re: Hadoop 101
>
> Yeah I found the TextInputFormat and TextKeyValueInputFormat and I know how
> to parse text--I'm just too lazy. I was hoping there was a Text equivalent
> of a SequenceFile that was hidden somewhere. As I said there is no mapper,
> this is running outside of hadoop M/R. So I at least need a line reader and
> not sure how the InputFormat works outside a mapper. But who cares, parsing
> is simple enough from scratch. All the TextKeyValueInputFormat gives me is
> splitting at the tab afaict.
>
> Actually this convinces me to look further into getting the values from
> method calls. They aren't quite what I want to begin with.
>
> Thanks for saving me more fruitless searches.
>
> On Dec 11, 2012, at 10:04 PM, David Parks <[email protected]> wrote:
>
> You use TextInputFormat, you'll get the following key<LongWritable>,
> value<Text> pairs in your mapper:
>
> file_position, your_input
>
> Example:
> 0,
>
> "0\t[356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597]"
> 100,
>
> "8\t[356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786
> 037]"
> 200,
>
> "25\t[284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.3482
> 1576]"
>
> Then just parse it out in your mapper.
>
>
> -----Original Message-----
> From: Pat Ferrel [mailto:[email protected]]
> Sent: Wednesday, December 12, 2012 7:50 AM
> To: [email protected]
> Subject: Hadoop 101
>
> Stupid question for the day.
>
> I have a file created by a mahout job of the form:
>
> 0
> [356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597]
> 8
>
> [356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786037]
> 25
>
> [284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.34821576]
> 28
>
> [452:0.34802154,454:0.34802154,453:0.34802154,456:0.34802154,455:0.34802154]
> .
>
> If this were a SequenceFile I could read it and be merrily on my way but
> it's a text file. The classes written are key, value pairs <LongWritable,
> VectorWritable> but the file is tab delimited text.
>
> I was hoping to do something like:
>
> SequenceFile.Reader reader = new SequenceFile.Reader(fs, inputFile, conf);
> Writable userId = new LongWritable(); VectorWritable recommendations = new
> VectorWritable(); while (reader.next(userId, recommendations)) {
>         //do something with each pair
> }
>
> But alas Google fails me. How do you read in key, values pairs from text
> files outside of a map or reduce?
>
>
>

Re: Hadoop 101

Reply via email to