Yeah I found the TextInputFormat and TextKeyValueInputFormat and I know how to parse text--I'm just too lazy. I was hoping there was a Text equivalent of a SequenceFile that was hidden somewhere. As I said there is no mapper, this is running outside of hadoop M/R. So I at least need a line reader and not sure how the InputFormat works outside a mapper. But who cares, parsing is simple enough from scratch. All the TextKeyValueInputFormat gives me is splitting at the tab afaict.
Actually this convinces me to look further into getting the values from method calls. They aren't quite what I want to begin with. Thanks for saving me more fruitless searches. On Dec 11, 2012, at 10:04 PM, David Parks <[email protected]> wrote: You use TextInputFormat, you'll get the following key<LongWritable>, value<Text> pairs in your mapper: file_position, your_input Example: 0, "0\t[356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597]" 100, "8\t[356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786 037]" 200, "25\t[284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.3482 1576]" Then just parse it out in your mapper. -----Original Message----- From: Pat Ferrel [mailto:[email protected]] Sent: Wednesday, December 12, 2012 7:50 AM To: [email protected] Subject: Hadoop 101 Stupid question for the day. I have a file created by a mahout job of the form: 0 [356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597] 8 [356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786037] 25 [284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.34821576] 28 [452:0.34802154,454:0.34802154,453:0.34802154,456:0.34802154,455:0.34802154] . If this were a SequenceFile I could read it and be merrily on my way but it's a text file. The classes written are key, value pairs <LongWritable, VectorWritable> but the file is tab delimited text. I was hoping to do something like: SequenceFile.Reader reader = new SequenceFile.Reader(fs, inputFile, conf); Writable userId = new LongWritable(); VectorWritable recommendations = new VectorWritable(); while (reader.next(userId, recommendations)) { //do something with each pair } But alas Google fails me. How do you read in key, values pairs from text files outside of a map or reduce?
