Issue with yarn cluster - hangs in accepted state.
I am trying to run below java class with yarn cluster, but it hangs in accepted state . i don't see any error . Below is the class and command . Any help is appreciated . Thanks, Abhi bin/spark-submit --class com.mycompany.app.SimpleApp --master yarn-cluster /home/hduser/my-app-1.0.jar {code} public class SimpleApp { public static void main(String[] args) { String logFile = "/home/hduser/testspark.txt"; // Should be some file on your system SparkConf conf = new SparkConf().setAppName("Simple Application"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD logData = sc.textFile(logFile).cache(); long numAs = logData.filter(new Function() { public Boolean call(String s) { return s.contains("a"); } }).count(); long numBs = logData.filter(new Function() { public Boolean call(String s) { return s.contains("b"); } }).count(); System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); } } {code} 15/03/03 11:47:40 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:41 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:42 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:43 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:44 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:45 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:46 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:47 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:48 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:49 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:50 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:51 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:52 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:53 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:54 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:55 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:56 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:57 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:58 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:47:59 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:48:00 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:48:01 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:48:02 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:48:03 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED) 15/03/03 11:48:04 INFO yarn.Client: Application report for application_1425398386987_0002 (state: ACCEPTED
Re: Issue with yarn cluster - hangs in accepted state.
Thanks, It worked. -Abhi On Tue, Mar 3, 2015 at 5:15 PM, Tobias Pfeiffer wrote: > Hi, > > On Wed, Mar 4, 2015 at 6:20 AM, Zhan Zhang wrote: > >> Do you have enough resource in your cluster? You can check your resource >> manager to see the usage. >> > > Yep, I can confirm that this is a very annoying issue. If there is not > enough memory or VCPUs available, your app will just stay in ACCEPTED state > until resources are available. > > You can have a look at > > https://github.com/jubatus/jubaql-docker/blob/master/hadoop/yarn-site.xml#L35 > to see some settings that might help. > > Tobias > > >
Priority queue in spark
Hi Current all the jobs in spark gets submitted using queue . i have a requirement where submitted job will generate another set of jobs with some priority , which should again be submitted to spark cluster based on priority ? Means job with higher priority should be executed first,Is it feasible ? Any help is appreciated ? Thanks, Abhi
Re: Priority queue in spark
If i understand correctly , the above document creates pool for priority which is static in nature and has to be defined before submitting the job . .in my scenario each generated task can have different priority. Thanks, Abhi On Mon, Mar 16, 2015 at 9:48 PM, twinkle sachdeva < twinkle.sachd...@gmail.com> wrote: > Hi, > > Maybe this is what you are looking for : > http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools > > Thanks, > > On Mon, Mar 16, 2015 at 8:15 PM, abhi wrote: > >> Hi >> Current all the jobs in spark gets submitted using queue . i have a >> requirement where submitted job will generate another set of jobs with some >> priority , which should again be submitted to spark cluster based on >> priority ? Means job with higher priority should be executed first,Is >> it feasible ? >> >> Any help is appreciated ? >> >> Thanks, >> Abhi >> >> > >
Re: Priority queue in spark
yes . Each generated job can have a different priority it is like a recursive function, where in each iteration generate job will be submitted to the spark cluster based on the priority. jobs will lower priority or less than some threshold will be discarded. Thanks, Abhi On Mon, Mar 16, 2015 at 10:36 PM, twinkle sachdeva < twinkle.sachd...@gmail.com> wrote: > Hi Abhi, > > You mean each task of a job can have different priority or job generated > via one job can have different priority? > > > > On Tue, Mar 17, 2015 at 11:04 AM, Mark Hamstra > wrote: > >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/Job-priority-td10076.html#a10079 >> >> On Mon, Mar 16, 2015 at 10:26 PM, abhi wrote: >> >>> If i understand correctly , the above document creates pool for priority >>> which is static in nature and has to be defined before submitting the job . >>> .in my scenario each generated task can have different priority. >>> >>> Thanks, >>> Abhi >>> >>> >>> On Mon, Mar 16, 2015 at 9:48 PM, twinkle sachdeva < >>> twinkle.sachd...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Maybe this is what you are looking for : >>>> http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools >>>> >>>> Thanks, >>>> >>>> On Mon, Mar 16, 2015 at 8:15 PM, abhi wrote: >>>> >>>>> Hi >>>>> Current all the jobs in spark gets submitted using queue . i have a >>>>> requirement where submitted job will generate another set of jobs with >>>>> some >>>>> priority , which should again be submitted to spark cluster based on >>>>> priority ? Means job with higher priority should be executed first,Is >>>>> it feasible ? >>>>> >>>>> Any help is appreciated ? >>>>> >>>>> Thanks, >>>>> Abhi >>>>> >>>>> >>>> >>>> >>> >> >
RDD storage in spark steaming
HI, i have a simple question about creating RDD . Whenever RDD is created in spark streaming for the particular time window .When does the RDD gets stored . 1. Does it get stored at the Driver machine ? or it gets stored on all the machines in the cluster ? 2. Does the data gets stored in memory by default ? Can it store at the memory and disk ? How can it configured ? Thanks, Abhi
unsubscribe
-- Abhi Basu
Re: sbt assembly with hive
I am getting the same message when trying to get HIveContext in CDH 5.1 after enabling Spark. I am thinking Spark should come with Hive enabled (default option) as Hive metastore is a common way to share data, due to popularity of Hive and other SQL-Over-Hadoop technologies like Impala. Thanks, Abhi On Fri, Dec 12, 2014 at 6:40 PM, Stephen Boesch wrote: > > > What is the proper way to build with hive from sbt? The SPARK_HIVE is > deprecated. However after running the following: > >sbt -Pyarn -Phadoop-2.3 -Phive assembly/assembly > > And then > bin/pyspark > >hivectx = HiveContext(sc) > >hivectx.hiveql("select * from my_table") > > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and > run sbt/sbt assembly", Py4JError(u'Trying to call a package.',)) > -- Abhi Basu
Re: Building Desktop application for ALS-MlLib/ Training ALS
In case you must write c# code, you can call python code from c# or use IronPython. :) On Mon, Dec 15, 2014 at 12:04 PM, Xiangrui Meng wrote: > > On Sun, Dec 14, 2014 at 3:06 AM, Saurabh Agrawal > wrote: > > > > > > Hi, > > > > > > > > I am a new bee in spark and scala world > > > > > > > > I have been trying to implement Collaborative filtering using MlLib > supplied > > out of the box with Spark and Scala > > > > > > > > I have 2 problems > > > > > > > > 1. The best model was trained with rank = 20 and lambda = 5.0, and > > numIter = 10, and its RMSE on the test set is 25.718710831912485. The > best > > model improves the baseline by 18.29%. Is there a scientific way in which > > RMSE could be brought down? What is a descent acceptable value for RMSE? > > > > The grid search approach used in the AMPCamp tutorial is pretty > standard. Whether an RMSE is good or not really depends on your > dataset. > > > 2. I picked up the Collaborative filtering algorithm from > > > http://ampcamp.berkeley.edu/5/exercises/movie-recommendation-with-mllib.html > > and executed the given code with my dataset. Now, I want to build a > desktop > > application around it. > > > > a. What is the best language to do this Java/ Scala? Any > possibility > > to do this using C#? > > > > We support Java/Scala/Python. Start with the one your are most > familiar with. C# is not supported. > > > b. Can somebody please share any relevant documents/ source or any > > helper links to help me get started on this? > > > > For ALS, you can check the API documentation. > > > > > > > Your help is greatly appreciated > > > > > > > > Thanks!! > > > > > > > > Regards, > > > > Saurabh Agrawal > > > > > > > > This e-mail, including accompanying communications and attachments, is > > strictly confidential and only for the intended recipient. Any retention, > > use or disclosure not expressly authorised by Markit is prohibited. This > > email is subject to all waivers and other terms at the following link: > > http://www.markit.com/en/about/legal/email-disclaimer.page > > > > Please visit http://www.markit.com/en/about/contact/contact-us.page? for > > contact information on our offices worldwide. > > > > MarkitSERV Limited has its registered office located at Level 4, > Ropemaker > > Place, 25 Ropemaker Street, London, EC2Y 9LY and is authorized and > regulated > > by the Financial Conduct Authority with registration number 207294 > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Abhi Basu
SparkSQL
I am working with CDH5.2 (Spark 1.0.0) and wondering which version of Spark comes with SparkSQL by default. Also, will SparkSQL come enabled to access the Hive Metastore? Is there an easier way to enable Hive support without have to build the code with various switches? Thanks, Abhi -- Abhi Basu