Re: Parsing a tsv file with key value pairs

Ravikant Dindokar Thu, 25 Jun 2015 02:40:20 -0700

So I have a file where each line represents an edge in the graph & has two
values separated by a tab. Both values are vertex id's (source and sink). I
want to parse this file as dictionary in spark RDD.
So my question is get these values in the form of dictionary in RDD?
sample file :
1    2
1    5
2    3


expected output : RDD (<1,2>,<1,5>,<2,3>)

Thanks
Ravikant

On Thu, Jun 25, 2015 at 2:59 PM, anshu shukla <anshushuk...@gmail.com>
wrote:

> Can you be more specific Or can you provide sample file .
>
> On Thu, Jun 25, 2015 at 11:00 AM, Ravikant Dindokar <
> ravikant.i...@gmail.com> wrote:
>
>> Hi Spark user,
>>
>> I am new to spark so forgive me for asking a basic question. I'm trying
>> to import my tsv file into spark. This file has key and value separated by
>> a \t per line. I want to import this file as dictionary of key value pairs
>> in Spark.
>>
>> I came across this code to do the same for csv file:
>>
>> import csv
>> import StringIO
>> ...
>> def loadRecord(line):
>> """Parse a CSV line"""
>>   input = StringIO.StringIO(line)
>>   reader = csv.DictReader(input, fieldnames=["name",   "favouriteAnimal"])
>>   return reader.next()
>> input = sc.textFile(inputFile).map(loadRecord)
>>
>> Can you point out the changes required to parse a tsv file?
>>
>> After following operation :
>>
>> split_lines = lines.map(_.split("\t"))
>>
>> what should I do to read the key values in dictionary?
>>
>>
>> Thanks
>>
>> Ravikant
>>
>>
>>
>
>
> --
> Thanks & Regards,
> Anshu Shukla
>

Re: Parsing a tsv file with key value pairs

Reply via email to