command to get list oin spark 2.0 scala of all persisted rdd's in spark 2.0 scala shell

2017-06-01 Thread nancy henry
Hi Team, Please let me know how to get list of all persisted RDD's ins park 2.0 shell Regards, Nancy

access error while trying to run distcp from source cluster

2017-05-25 Thread nancy henry
Hi Team, I am trying to copy data from A cluster to B cluster and same user for both I am running distcp command on source cluster A but i am getting error 17/05/25 07:24:08 INFO mapreduce.Job: Running job: job_1492549627402_344485 17/05/25 07:24:17 INFO mapreduce.Job: Job job_1492549627402_344

Hive ::: how to select where conditions dynamically using CASE

2017-04-12 Thread nancy henry
Hi , Lets say I have a employee table testtab1.empid testtab1.empnametesttab1.joindate testtab1.bonus 1 sirisha 15-06-2016 60 2 Arun15-10-2016 20 3 divya 17-06-2016 80 4 rahul 16-01-2016 30 5 kokila 17-02-2016

keep or remove sc.stop() coz of RpcEnv already stopped error

2017-03-13 Thread nancy henry
Hi Team, getting this error if we put sc.stop() in application.. can we remove it from application but i read if yu dont explicitly stop using sc.stop the yarn application will not get registered in history service.. SO what to do ? WARN Dispatcher: Message RemoteProcessDisconnected droppe

Re: spark-sql use case beginner question

2017-03-08 Thread nancy henry
on.engine=spark as beeline command line parameter, assuming you > are running hive scripts using beeline command line (which is suggested > practice for security purposes). > > > > On Thu, Mar 9, 2017 at 2:09 PM, nancy henry > wrote: > >> >> Hi Team, >> >>

spark-sql use case beginner question

2017-03-08 Thread nancy henry
Hi Team, basically we have all data as hive tables ..and processing it till now in hive on MR.. now that we have hivecontext which can run hivequeries on spark, we are making all these complex hive scripts to run using hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically running

Re: spark-sql use case beginner question

2017-03-08 Thread nancy henry
Hi Team, basically we have all data as hive tables ..and processing it till now in hive on MR.. now that we have hivecontext which can run hivequeries on spark, we are making all these complex hive scripts to run using hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically running

made spark job to throw exception still going under finished succeeded status in yarn

2017-03-07 Thread nancy henry
Hi Team, Wrote below code to throw exception.. How to make below code to throw exception and make the job to goto failed status in yarn if under some condition but still close spark context and release resources .. object Demo { def main(args: Array[String]) = { var a = 0; var c = 0;

care to share latest pom forspark scala applications eclipse?

2017-02-24 Thread nancy henry
Hi Guys, Please one of you who is successfully able to bbuild maven packages in eclipse scala IDE please share your pom.xml

quick question: best to use cluster mode or client mode for production?

2017-02-23 Thread nancy henry
Hi Team, I have set of hc.sql("hivequery") kind of scripts which i am running right now in spark-shell How should i schedule it in production making it spark-shell -i script.scala or keeping it in jar file through eclipse and use spark-submit deploy mode cluster? which is advisable?

please send me pom.xml for scala 2.10

2017-02-21 Thread nancy henry
Hi, Please send me a copy of pom.xml as I am getting no sources to compile error how much eve i try to set source in pom.xml its not recognizing source fils from my src/main/scala So please send me one (includes hive context and spark core)

how to give hdfs file path as argument to spark-submit

2017-02-17 Thread nancy henry
Hi All, object Step1 { def main(args: Array[String]) = { val sparkConf = new SparkConf().setAppName("my-app") val sc = new SparkContext(sparkConf) val hiveSqlContext: HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) hiveSqlContext.sql(scala.io.Source.fromFile(args(

scala.io.Source.fromFile protocol for hadoop

2017-02-16 Thread nancy henry
Hello, hiveSqlContext.sql(scala.io.Source.fromFile(args(0).toString()).mkString).collect() I have a file in my local system and i am spark-submit deploy mode cluster on hadoop so args(0) should be on hadoop cluster or local? what should be the protocol file:/// for hadoop what is the protoc

Re: Lost executor 4 Container killed by YARN for exceeding memory limits.

2017-02-13 Thread nancy henry
GB. > > You may want to consult with your DevOps/Operations/Spark Admin team first. > > > > *From: *Jon Gregg > *Date: *Monday, February 13, 2017 at 8:58 AM > *To: *nancy henry > *Cc: *"user @spark" > *Subject: *Re: Lost executor 4 Container killed by YARN for

Lost executor 4 Container killed by YARN for exceeding memory limits.

2017-02-13 Thread nancy henry
Hi All,, I am getting below error while I am trying to join 3 tables which are in ORC format in hive from 5 10gb tables through hive context in spark Container killed by YARN for exceeding memory limits. 11.1 GB of 11 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-09 Thread nancy henry
Hi All, Is it better to Use Java or Python on Scala for Spark coding.. Mainly My work is with getting file data which is in csv format and I have to do some rule checking and rule aggrgeation and put the final filtered data back to oracle so that real time apps can use it..