subject:"Dealing with headers in csv file pyspark"

Re: Dealing with headers in csv file pyspark

2014-02-26 Thread Bryn Keller

In the past I've handled this by filtering out the header line, but it seems to me that it would be useful to have a way of dealing with files that would preserve sequence, so that e.g. you could just do mySequentialRDD.drop(1) to get rid of the header. There are other use cases like this that curr

Re: Dealing with headers in csv file pyspark

2014-02-26 Thread Ewen Cheslack-Postava

You must be parsing each line of the file at some point anyway, so adding a step to filter out the header should work fine. It'll get executed at the same time as your parsing/conversion to ints, so there's no significant overhead aside from the check itself. For standalone programs, there's a

Re: Dealing with headers in csv file pyspark

2014-02-26 Thread Chengi Liu

I am not sure.. the suggestion is to open a TB file and remove a line? That doesnt sounds that good. I am hacking my way by using a filter.. Can I put a try:except clause in my lambda function.. Maybe i should just try that out. But thanks for the suggestion. Also, can i run scripts against spark

Re: Dealing with headers in csv file pyspark

2014-02-26 Thread Mayur Rustagi

Bad solution is to run a mapper through the data and null the counts , good solution is to trim the header before hand without Spark. On Feb 26, 2014 9:28 AM, "Chengi Liu" wrote: > Hi, > How do we deal with headers in csv file. > For example: > id, counts > 1,2 > 1,5 > 2,20 > 2,25 > ... and so

Dealing with headers in csv file pyspark

2014-02-26 Thread Chengi Liu

Hi, How do we deal with headers in csv file. For example: id, counts 1,2 1,5 2,20 2,25 ... and so on And I want to do a frequency count of counts for each id. So result will be : 1,7 2,45 and so on.. My code: counts = data.map(lambda x: (x[0],int(x[1]))).reduceByKey(lambda a, b: a + b)) But

Re: Dealing with headers in csv file pyspark

Re: Dealing with headers in csv file pyspark

Re: Dealing with headers in csv file pyspark

Re: Dealing with headers in csv file pyspark

Dealing with headers in csv file pyspark

5 matches

Site Navigation

Mail list logo

Footer information