sc.textFile("filename").map(_.split(",")).filter(arr => arr.length == 3 && arr(2).toDouble > 50).collect this will give you a Array[Array[String]] do as you may wish with it. And please read through abt RDD
On 5 Sep 2016 8:51 pm, "Ashok Kumar" <ashok34...@yahoo.com> wrote: > Thanks everyone. > > I am not skilled like you gentlemen > > This is what I did > > 1) Read the text file > > val textFile = sc.textFile("/tmp/myfile.txt") > > 2) That produces an RDD of String. > > 3) Create a DF after splitting the file into an Array > > val df = textFile.map(line => line.split(",")).map(x=>(x(0). > toInt,x(1).toString,x(2).toDouble)).toDF > > 4) Create a class for column headers > > case class Columns(col1: Int, col2: String, col3: Double) > > 5) Assign the column headers > > val h = df.map(p => Columns(p(0).toString.toInt, p(1).toString, > p(2).toString.toDouble)) > > 6) Only interested in column 3 > 50 > > h.filter(col("Col3") > 50.0) > > 7) Now I just want Col3 only > > h.filter(col("Col3") > 50.0).select("col3").show(5) > +-----------------+ > | col3| > +-----------------+ > |95.42536350467836| > |61.56297588648554| > |76.73982017179868| > |68.86218120274728| > |67.64613810115105| > +-----------------+ > only showing top 5 rows > > Does that make sense. Are there shorter ways gurus? Can I just do all this > on RDD without DF? > > Thanking you > > > > > > > > On Monday, 5 September 2016, 15:19, ayan guha <guha.a...@gmail.com> wrote: > > > Then, You need to refer third term in the array, convert it to your > desired data type and then use filter. > > > On Tue, Sep 6, 2016 at 12:14 AM, Ashok Kumar <ashok34...@yahoo.com> wrote: > > Hi, > I want to filter them for values. > > This is what is in array > > 74,20160905-133143,98. 11218069128827594148 > > I want to filter anything > 50.0 in the third column > > Thanks > > > > > On Monday, 5 September 2016, 15:07, ayan guha <guha.a...@gmail.com> wrote: > > > Hi > > x.split returns an array. So, after first map, you will get RDD of arrays. > What is your expected outcome of 2nd map? > > On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar <ashok34...@yahoo.com.invalid > > wrote: > > Thank you sir. > > This is what I get > > scala> textFile.map(x=> x.split(",")) > res52: org.apache.spark.rdd.RDD[ Array[String]] = MapPartitionsRDD[27] at > map at <console>:27 > > How can I work on individual columns. I understand they are strings > > scala> textFile.map(x=> x.split(",")).map(x => (x.getString(0)) > | ) > <console>:27: error: value getString is not a member of Array[String] > textFile.map(x=> x.split(",")).map(x => (x.getString(0)) > > regards > > > > > On Monday, 5 September 2016, 13:51, Somasundaram Sekar <somasundar.sekar@ > tigeranalytics.com <somasundar.se...@tigeranalytics.com>> wrote: > > > Basic error, you get back an RDD on transformations like map. > sc.textFile("filename").map(x => x.split(",") > > On 5 Sep 2016 6:19 pm, "Ashok Kumar" <ashok34...@yahoo.com.invalid> wrote: > > Hi, > > I have a text file as below that I read in > > 74,20160905-133143,98. 11218069128827594148 > 75,20160905-133143,49. 52776998815916807742 > 76,20160905-133143,56. 08029957123980984556 > 77,20160905-133143,46. 63689526544407522777 > 78,20160905-133143,84. 88227141164402181551 > 79,20160905-133143,68. 72408602520662115000 > > val textFile = sc.textFile("/tmp/mytextfile. txt") > > Now I want to split the rows separated by "," > > scala> textFile.map(x=>x.toString). split(",") > <console>:27: error: value split is not a member of > org.apache.spark.rdd.RDD[ String] > textFile.map(x=>x.toString). split(",") > > However, the above throws error? > > Any ideas what is wrong or how I can do this if I can avoid converting it > to String? > > Thanking > > > > > > > -- > Best Regards, > Ayan Guha > > > > > > -- > Best Regards, > Ayan Guha > > >