So I have a file where each line represents an edge in the graph & has two values separated by a tab. Both values are vertex id's (source and sink). I want to parse this file as dictionary in spark RDD. So my question is get these values in the form of dictionary in RDD? sample file : 1 2 1 5 2 3
expected output : RDD (<1,2>,<1,5>,<2,3>) Thanks Ravikant On Thu, Jun 25, 2015 at 2:59 PM, anshu shukla <anshushuk...@gmail.com> wrote: > Can you be more specific Or can you provide sample file . > > On Thu, Jun 25, 2015 at 11:00 AM, Ravikant Dindokar < > ravikant.i...@gmail.com> wrote: > >> Hi Spark user, >> >> I am new to spark so forgive me for asking a basic question. I'm trying >> to import my tsv file into spark. This file has key and value separated by >> a \t per line. I want to import this file as dictionary of key value pairs >> in Spark. >> >> I came across this code to do the same for csv file: >> >> import csv >> import StringIO >> ... >> def loadRecord(line): >> """Parse a CSV line""" >> input = StringIO.StringIO(line) >> reader = csv.DictReader(input, fieldnames=["name", "favouriteAnimal"]) >> return reader.next() >> input = sc.textFile(inputFile).map(loadRecord) >> >> Can you point out the changes required to parse a tsv file? >> >> After following operation : >> >> split_lines = lines.map(_.split("\t")) >> >> what should I do to read the key values in dictionary? >> >> >> Thanks >> >> Ravikant >> >> >> > > > -- > Thanks & Regards, > Anshu Shukla >