Issue with yarn cluster - hangs in accepted state.

2015-03-03 Thread abhi
I am trying to run below java class with yarn cluster, but it hangs in
accepted state . i don't see any error . Below is the class and command .
Any help is appreciated .


Thanks,

Abhi





bin/spark-submit --class com.mycompany.app.SimpleApp --master yarn-cluster
/home/hduser/my-app-1.0.jar


{code}

public class SimpleApp {

public static void main(String[] args) {

  String logFile = "/home/hduser/testspark.txt"; // Should be some file
on your system

  SparkConf conf = new SparkConf().setAppName("Simple Application");

  JavaSparkContext sc = new JavaSparkContext(conf);

  JavaRDD logData = sc.textFile(logFile).cache();


  long numAs = logData.filter(new Function() {

public Boolean call(String s) { return s.contains("a"); }

  }).count();


  long numBs = logData.filter(new Function() {

public Boolean call(String s) { return s.contains("b"); }

  }).count();


  System.out.println("Lines with a: " + numAs + ", lines with b: " +
numBs);

}

  }

{code}


15/03/03 11:47:40 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:41 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:42 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:43 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:44 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:45 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:46 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:47 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:48 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:49 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:50 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:51 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:52 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:53 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:54 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:55 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:56 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:57 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:58 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:59 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:00 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:01 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:02 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:03 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:04 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED


Re: Issue with yarn cluster - hangs in accepted state.

2015-03-15 Thread abhi
Thanks,
It worked.

-Abhi

On Tue, Mar 3, 2015 at 5:15 PM, Tobias Pfeiffer  wrote:

> Hi,
>
> On Wed, Mar 4, 2015 at 6:20 AM, Zhan Zhang  wrote:
>
>>  Do you have enough resource in your cluster? You can check your resource
>> manager to see the usage.
>>
>
> Yep, I can confirm that this is a very annoying issue. If there is not
> enough memory or VCPUs available, your app will just stay in ACCEPTED state
> until resources are available.
>
> You can have a look at
>
> https://github.com/jubatus/jubaql-docker/blob/master/hadoop/yarn-site.xml#L35
> to see some settings that might help.
>
> Tobias
>
>
>


Priority queue in spark

2015-03-16 Thread abhi
Hi
Current all the jobs in spark gets submitted using queue . i have a
requirement where submitted job will generate another set of jobs with some
priority , which should again be submitted to spark cluster based on
priority ? Means job with higher priority should be executed first,Is
it feasible  ?

Any help is appreciated ?

Thanks,
Abhi


Re: Priority queue in spark

2015-03-16 Thread abhi
If i understand correctly , the above document creates pool for priority
which is static in nature and has to be defined before submitting the job .
.in my scenario each generated task can have different priority.

Thanks,
Abhi


On Mon, Mar 16, 2015 at 9:48 PM, twinkle sachdeva <
twinkle.sachd...@gmail.com> wrote:

> Hi,
>
> Maybe this is what you are looking for :
> http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools
>
> Thanks,
>
> On Mon, Mar 16, 2015 at 8:15 PM, abhi  wrote:
>
>> Hi
>> Current all the jobs in spark gets submitted using queue . i have a
>> requirement where submitted job will generate another set of jobs with some
>> priority , which should again be submitted to spark cluster based on
>> priority ? Means job with higher priority should be executed first,Is
>> it feasible  ?
>>
>> Any help is appreciated ?
>>
>> Thanks,
>> Abhi
>>
>>
>
>


Re: Priority queue in spark

2015-03-16 Thread abhi
yes .
Each generated job can have a different priority it is like a recursive
function, where in each iteration generate job will be submitted to the
spark cluster based on the priority.  jobs will lower priority or less than
some threshold will be discarded.

Thanks,
Abhi


On Mon, Mar 16, 2015 at 10:36 PM, twinkle sachdeva <
twinkle.sachd...@gmail.com> wrote:

> Hi Abhi,
>
> You mean each task of a job can have different priority or job generated
> via one job can have different priority?
>
>
>
> On Tue, Mar 17, 2015 at 11:04 AM, Mark Hamstra 
> wrote:
>
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Job-priority-td10076.html#a10079
>>
>> On Mon, Mar 16, 2015 at 10:26 PM, abhi  wrote:
>>
>>> If i understand correctly , the above document creates pool for priority
>>> which is static in nature and has to be defined before submitting the job .
>>> .in my scenario each generated task can have different priority.
>>>
>>> Thanks,
>>> Abhi
>>>
>>>
>>> On Mon, Mar 16, 2015 at 9:48 PM, twinkle sachdeva <
>>> twinkle.sachd...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Maybe this is what you are looking for :
>>>> http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools
>>>>
>>>> Thanks,
>>>>
>>>> On Mon, Mar 16, 2015 at 8:15 PM, abhi  wrote:
>>>>
>>>>> Hi
>>>>> Current all the jobs in spark gets submitted using queue . i have a
>>>>> requirement where submitted job will generate another set of jobs with 
>>>>> some
>>>>> priority , which should again be submitted to spark cluster based on
>>>>> priority ? Means job with higher priority should be executed first,Is
>>>>> it feasible  ?
>>>>>
>>>>> Any help is appreciated ?
>>>>>
>>>>> Thanks,
>>>>> Abhi
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>


RDD storage in spark steaming

2015-03-23 Thread abhi
HI,
i have a simple question about creating RDD . Whenever RDD is created in
spark streaming for the particular time window .When does the RDD gets
stored .

1. Does it get stored at the Driver machine ? or it gets stored on all the
machines in the cluster ?
2. Does the data gets stored in memory by default ? Can it store at the
memory and disk ? How can it configured ?


Thanks,
Abhi


unsubscribe

2015-01-28 Thread Abhi Basu
-- 
Abhi Basu


Re: sbt assembly with hive

2014-12-12 Thread Abhi Basu
I am getting the same message when trying to get HIveContext in CDH 5.1
after enabling Spark. I am thinking Spark should come with Hive enabled
(default option) as Hive metastore is a common way to share data, due to
popularity of Hive and other SQL-Over-Hadoop technologies like Impala.

Thanks,

Abhi

On Fri, Dec 12, 2014 at 6:40 PM, Stephen Boesch  wrote:
>
>
> What is the proper way to build with hive from sbt?  The SPARK_HIVE is
> deprecated. However after running the following:
>
>sbt -Pyarn -Phadoop-2.3 -Phive  assembly/assembly
>
> And then
>   bin/pyspark
>
>hivectx = HiveContext(sc)
>
>hivectx.hiveql("select * from my_table")
>
> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and
> run sbt/sbt assembly", Py4JError(u'Trying to call a package.',))
>


-- 
Abhi Basu


Re: Building Desktop application for ALS-MlLib/ Training ALS

2014-12-15 Thread Abhi Basu
In case you must write c# code, you can call python code from c# or use
IronPython. :)

On Mon, Dec 15, 2014 at 12:04 PM, Xiangrui Meng  wrote:
>
> On Sun, Dec 14, 2014 at 3:06 AM, Saurabh Agrawal
>  wrote:
> >
> >
> > Hi,
> >
> >
> >
> > I am a new bee in spark and scala world
> >
> >
> >
> > I have been trying to implement Collaborative filtering using MlLib
> supplied
> > out of the box with Spark and Scala
> >
> >
> >
> > I have 2 problems
> >
> >
> >
> > 1.   The best model was trained with rank = 20 and lambda = 5.0, and
> > numIter = 10, and its RMSE on the test set is 25.718710831912485. The
> best
> > model improves the baseline by 18.29%. Is there a scientific way in which
> > RMSE could be brought down? What is a descent acceptable value for RMSE?
> >
>
> The grid search approach used in the AMPCamp tutorial is pretty
> standard. Whether an RMSE is good or not really depends on your
> dataset.
>
> > 2.   I picked up the Collaborative filtering algorithm from
> >
> http://ampcamp.berkeley.edu/5/exercises/movie-recommendation-with-mllib.html
> > and executed the given code with my dataset. Now, I want to build a
> desktop
> > application around it.
> >
> > a.   What is the best language to do this Java/ Scala? Any
> possibility
> > to do this using C#?
> >
>
> We support Java/Scala/Python. Start with the one your are most
> familiar with. C# is not supported.
>
> > b.  Can somebody please share any relevant documents/ source or any
> > helper links to help me get started on this?
> >
>
> For ALS, you can check the API documentation.
>
> >
> >
> > Your help is greatly appreciated
> >
> >
> >
> > Thanks!!
> >
> >
> >
> > Regards,
> >
> > Saurabh Agrawal
> >
> >
> > 
> > This e-mail, including accompanying communications and attachments, is
> > strictly confidential and only for the intended recipient. Any retention,
> > use or disclosure not expressly authorised by Markit is prohibited. This
> > email is subject to all waivers and other terms at the following link:
> > http://www.markit.com/en/about/legal/email-disclaimer.page
> >
> > Please visit http://www.markit.com/en/about/contact/contact-us.page? for
> > contact information on our offices worldwide.
> >
> > MarkitSERV Limited has its registered office located at Level 4,
> Ropemaker
> > Place, 25 Ropemaker Street, London, EC2Y 9LY and is authorized and
> regulated
> > by the Financial Conduct Authority with registration number 207294
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-- 
Abhi Basu


SparkSQL

2015-01-08 Thread Abhi Basu
I am working with CDH5.2 (Spark 1.0.0) and wondering which version of Spark
comes with SparkSQL by default. Also, will SparkSQL come enabled to access
the Hive Metastore? Is there an easier way to enable Hive support without
have to build the code with various switches?

Thanks,

Abhi

-- 
Abhi Basu