Re: Splitting columns from a text file

Somasundaram Sekar Mon, 05 Sep 2016 11:55:59 -0700

sc.textFile("filename").map(_.split(",")).filter(arr => arr.length == 3 &&
arr(2).toDouble > 50).collect this will give you a Array[Array[String]] do
as you may wish with it. And please read through abt RDD


On 5 Sep 2016 8:51 pm, "Ashok Kumar" <ashok34...@yahoo.com> wrote:

> Thanks everyone.
>
> I am not skilled like you gentlemen
>
> This is what I did
>
> 1) Read the text file
>
> val textFile = sc.textFile("/tmp/myfile.txt")
>
> 2) That produces an RDD of String.
>
> 3) Create a DF after splitting the file into an Array
>
> val df = textFile.map(line => line.split(",")).map(x=>(x(0).
> toInt,x(1).toString,x(2).toDouble)).toDF
>
> 4) Create a class for column headers
>
>  case class Columns(col1: Int, col2: String, col3: Double)
>
> 5) Assign the column headers
>
> val h = df.map(p => Columns(p(0).toString.toInt, p(1).toString,
> p(2).toString.toDouble))
>
> 6) Only interested in column 3 > 50
>
>  h.filter(col("Col3") > 50.0)
>
> 7) Now I just want Col3 only
>
> h.filter(col("Col3") > 50.0).select("col3").show(5)
> +-----------------+
> |             col3|
> +-----------------+
> |95.42536350467836|
> |61.56297588648554|
> |76.73982017179868|
> |68.86218120274728|
> |67.64613810115105|
> +-----------------+
> only showing top 5 rows
>
> Does that make sense. Are there shorter ways gurus? Can I just do all this
> on RDD without DF?
>
> Thanking you
>
>
>
>
>
>
>
> On Monday, 5 September 2016, 15:19, ayan guha <guha.a...@gmail.com> wrote:
>
>
> Then, You need to refer third term in the array, convert it to your
> desired data type and then use filter.
>
>
> On Tue, Sep 6, 2016 at 12:14 AM, Ashok Kumar <ashok34...@yahoo.com> wrote:
>
> Hi,
> I want to filter them for values.
>
> This is what is in array
>
> 74,20160905-133143,98. 11218069128827594148
>
> I want to filter anything > 50.0 in the third column
>
> Thanks
>
>
>
>
> On Monday, 5 September 2016, 15:07, ayan guha <guha.a...@gmail.com> wrote:
>
>
> Hi
>
> x.split returns an array. So, after first map, you will get RDD of arrays.
> What is your expected outcome of 2nd map?
>
> On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar <ashok34...@yahoo.com.invalid
> > wrote:
>
> Thank you sir.
>
> This is what I get
>
> scala> textFile.map(x=> x.split(","))
> res52: org.apache.spark.rdd.RDD[ Array[String]] = MapPartitionsRDD[27] at
> map at <console>:27
>
> How can I work on individual columns. I understand they are strings
>
> scala> textFile.map(x=> x.split(",")).map(x => (x.getString(0))
>      | )
> <console>:27: error: value getString is not a member of Array[String]
>        textFile.map(x=> x.split(",")).map(x => (x.getString(0))
>
> regards
>
>
>
>
> On Monday, 5 September 2016, 13:51, Somasundaram Sekar <somasundar.sekar@
> tigeranalytics.com <somasundar.se...@tigeranalytics.com>> wrote:
>
>
> Basic error, you get back an RDD on transformations like map.
> sc.textFile("filename").map(x => x.split(",")
>
> On 5 Sep 2016 6:19 pm, "Ashok Kumar" <ashok34...@yahoo.com.invalid> wrote:
>
> Hi,
>
> I have a text file as below that I read in
>
> 74,20160905-133143,98. 11218069128827594148
> 75,20160905-133143,49. 52776998815916807742
> 76,20160905-133143,56. 08029957123980984556
> 77,20160905-133143,46. 63689526544407522777
> 78,20160905-133143,84. 88227141164402181551
> 79,20160905-133143,68. 72408602520662115000
>
> val textFile = sc.textFile("/tmp/mytextfile. txt")
>
> Now I want to split the rows separated by ","
>
> scala> textFile.map(x=>x.toString). split(",")
> <console>:27: error: value split is not a member of
> org.apache.spark.rdd.RDD[ String]
>        textFile.map(x=>x.toString). split(",")
>
> However, the above throws error?
>
> Any ideas what is wrong or how I can do this if I can avoid converting it
> to String?
>
> Thanking
>
>
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>

Re: Splitting columns from a text file

Reply via email to