Connecting SparkR through Yarn

2015-11-08 Thread Amit Behera
Hi All, Spark Version = 1.5.1 Hadoop Version = 2.6.0 I set up the cluster in Amazon EC2 machines (1+5) I am able create a SparkContext object using *init* method from *RStudio.* But I do not know how can I create a SparkContext object in *yarn mode.* I got the below link to run on yarn. but in

How can I read file from HDFS i sparkR from RStudio

2015-10-08 Thread Amit Behera
Hi All, I am very new to SparkR. I am able to run a sample code from example given in the link : http://www.r-bloggers.com/installing-and-starting-sparkr-locally-on-windows-os-and-rstudio/ Then I am trying to read a file from HDFS in RStudio, but unable to read. Below is my code. *Sy

Re: groupByKey is not working

2015-01-30 Thread Amit Behera
port as Sean mentioned. >> It includes implicits that intellij will not know about otherwise. >> >> 2015-01-30 12:44 GMT-08:00 Amit Behera : >> >> I am sorry Sean. >>> >>> I am developing code in intelliJ Idea. so with the above dependencies I >>

Re: groupByKey is not working

2015-01-30 Thread Amit Behera
kContext._ > > > On Fri Jan 30 2015 at 3:21:45 PM Amit Behera wrote: > >> hi all, >> >> my sbt file is like this: >> >> name := "Spark" >> >> version := "1.0" >> >> scalaVersion := "2.10.4" >> >&g

Re: groupByKey is not working

2015-01-30 Thread Amit Behera
*really* need to say what that means. > > On Fri, Jan 30, 2015 at 8:20 PM, Amit Behera wrote: > > hi all, > > > > my sbt file is like this: > > > > name := "Spark" > > > > version := "1.0" > > > >

groupByKey is not working

2015-01-30 Thread Amit Behera
hi all, my sbt file is like this: name := "Spark" version := "1.0" scalaVersion := "2.10.4" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" libraryDependencies += "net.sf.opencsv" % "opencsv" % "2.3" *code:* object SparkJob { def pLines(lines:Iterator[String])={

Re: unable to check whether an item is present in RDD

2014-12-28 Thread Amit Behera
Hi Sean and Nicholas Thank you very much, *exists* method works here :) On Sun, Dec 28, 2014 at 2:27 PM, Sean Owen wrote: > Try instead i.exists(_ == target) > On Dec 28, 2014 8:46 AM, "Amit Behera" wrote: > >> Hi Nicholas, >> >> I am getting >&g

Re: unable to check whether an item is present in RDD

2014-12-28 Thread Amit Behera
but you can find the element you want > by doing something like: > > rdd.filter(i => i.contains(target)).collect() > > Where target is the Int you are looking for. > > Nick > ​ > > On Sun Dec 28 2014 at 3:28:45 AM Amit Behera wrote: > >> Hi Nicholas, >&

Re: unable to check whether an item is present in RDD

2014-12-28 Thread Amit Behera
Hi Sean, I have a RDD like *theItems: org.apache.spark.rdd.RDD[Iterable[Int]]* I did like *val items = theItems.collect *//to get it as an array items: Array[Iterable[Int]] *val check = items.contains(item)* Thanks Amit On Sun, Dec 28, 2014 at 1:58 PM, Amit Behera wrote: > Hi Nicho

Re: unable to check whether an item is present in RDD

2014-12-28 Thread Amit Behera
thod > > On Sun, Dec 28, 2014 at 1:54 AM, Amit Behera wrote: > >> Hi All, >> >> I want to check an item is present or not in a RDD of Iterable[Int] using >> scala >> >> something like in java we do : >> >> *list.contains(item)* >> >>

unable to check whether an item is present in RDD

2014-12-27 Thread Amit Behera
Hi All, I want to check an item is present or not in a RDD of Iterable[Int] using scala something like in java we do : *list.contains(item)* and the statement returns true if the item is present otherwise false. Please help me to find the solution. Thanks Amit

Re: unable to do group by with 1st column

2014-12-26 Thread Amit Behera
ones >> >>.reduceByKey(*new* *Function2> String>()* { >> >> @Override >> >> *public* String call(String i1, String >> i2) { >> >>

unable to do group by with 1st column

2014-12-25 Thread Amit Behera
Hi Users, I am reading a csv file and my data format is like : key1,value1 key1,value2 key1,value1 key1,value3 key2,value1 key2,value5 key2,value5 key2,value4 key1,value4 key1,value4 key3,value1 key3,value1 key3,value2 required output : key1:[value1,value2,value1,value3,value4,value4] key2:[val