Setup Spark jobserver for Spark SQL

2015-04-02 Thread Harika
Hi,

I am trying to Spark Jobserver(
https://github.com/spark-jobserver/spark-jobserver
  ) for running Spark
SQL jobs.

I was able to start the server but when I run my application(my Scala class
which extends SparkSqlJob), I am getting the following as response:

{
  "status": "ERROR",
  "result": "Invalid job type for this context"
}

Can any one suggest me what is going wrong or provide a detailed procedure
for setting up jobserver for SparkSQL? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Setup-Spark-jobserver-for-Spark-SQL-tp22352.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL: Day of month from Timestamp

2015-03-24 Thread Harika Matha
You can use the functions Arush mentioned if you use HiveContext instead of
SparkContext.

On Tue, Mar 24, 2015 at 12:59 PM, Arush Kharbanda <
ar...@sigmoidanalytics.com> wrote:

> Hi
>
> You can use functions like year(date),month(date)
>
> Thanks
> Arush
>
> On Tue, Mar 24, 2015 at 12:46 PM, Harut Martirosyan <
> harut.martiros...@gmail.com> wrote:
>
>> Hi guys.
>>
>> Basically, we had to define a UDF that does that, is there a built in
>> function that we can use for it?
>>
>> --
>> RGRDZ Harut
>>
>
>
>
> --
>
> [image: Sigmoid Analytics] 
>
> *Arush Kharbanda* || Technical Teamlead
>
> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>


Re: Spark-on-YARN architecture

2015-03-10 Thread Harika Matha
Thanks for the quick reply.

I am running the application in YARN client mode.
And I want to run the AM on the same node as RM inorder use the node which
otherwise would run AM.

How can I get AM run on the same node as RM?


On Tue, Mar 10, 2015 at 3:49 PM, Sean Owen  wrote:

> In YARN cluster mode, there is no Spark master, since YARN is your
> resource manager. Yes you could force your AM somehow to run on the
> same node as the RM, but why -- what do think is faster about that?
>
> On Tue, Mar 10, 2015 at 10:06 AM, Harika  wrote:
> > Hi all,
> >
> > I have Spark cluster setup on YARN with 4 nodes(1 master and 3 slaves).
> When
> > I run an application, YARN chooses, at random, one Application Master
> from
> > among the slaves. This means that my final computation is  being carried
> > only on two slaves. This decreases the performance of the cluster.
> >
> > 1. Is this the correct way of configuration? What is the architecture of
> > Spark on YARN?
> > 2. Is there a way in which I can run Spark master, YARN application
> master
> > and resource manager on a single node?(so that I can use three other
> nodes
> > for the computation)
> >
> > Thanks
> > Harika
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-architecture-tp21986.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>


Spark-on-YARN architecture

2015-03-10 Thread Harika
Hi all,

I have Spark cluster setup on YARN with 4 nodes(1 master and 3 slaves). When
I run an application, YARN chooses, at random, one Application Master from
among the slaves. This means that my final computation is  being carried
only on two slaves. This decreases the performance of the cluster. 

1. Is this the correct way of configuration? What is the architecture of
Spark on YARN?
2. Is there a way in which I can run Spark master, YARN application master
and resource manager on a single node?(so that I can use three other nodes
for the computation)

Thanks
Harika





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-architecture-tp21986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Setting up Spark with YARN on EC2 cluster

2015-02-25 Thread Harika
Hi,

I want to setup a Spark cluster with YARN dependency on Amazon EC2. I was
reading  this <https://spark.apache.org/docs/1.2.0/running-on-yarn.html>  
document and I understand that Hadoop has to be setup for running Spark with
YARN. My questions - 

1. Do we have to setup Hadoop cluster on EC2 and then build Spark on it?
2. Is there a way to modify the existing Spark cluster to work with YARN?

Thanks in advance.

Harika



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Setting-up-Spark-with-YARN-on-EC2-cluster-tp21818.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running multiple threads with same Spark Context

2015-02-25 Thread Harika Matha
Hi Yana,

I tried running the program after setting the property
"spark.scheduler.mode" to FAIR. But the result is same as previous. Are
there any other properties that have to be set?


On Tue, Feb 24, 2015 at 10:26 PM, Yana Kadiyska 
wrote:

> It's hard to tell. I have not run this on EC2 but this worked for me:
>
> The only thing that I can think of is that the scheduling mode is set to
>
>- *Scheduling Mode:* FAIR
>
>
> val pool: ExecutorService = Executors.newFixedThreadPool(poolSize)
> while_loop to get curr_job
>  pool.execute(new ReportJob(sqlContext, curr_job, i))
>
> class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: 
> String,id:Int) extends Runnable with Logging {
>   def threadId = (Thread.currentThread.getName() + "\t")
>
>   def run() {
> logInfo(s"* Running ${threadId} ${id}")
> val startTime = Platform.currentTime
> val hiveQuery=query
> val result_set = sqlContext.sql(hiveQuery)
> result_set.repartition(1)
> result_set.saveAsParquetFile(s"hdfs:///tmp/${id}")
> logInfo(s"* DONE ${threadId} ${id} time: 
> "+(Platform.currentTime-startTime))
>   }
> }
>
> ​
>
> On Tue, Feb 24, 2015 at 4:04 AM, Harika  wrote:
>
>> Hi all,
>>
>> I have been running a simple SQL program on Spark. To test the
>> concurrency,
>> I have created 10 threads inside the program, all threads using same
>> SQLContext object. When I ran the program on my EC2 cluster using
>> spark-submit, only 3 threads were running in parallel. I have repeated the
>> test on different EC2 clusters (containing different number of cores) and
>> found out that only 3 threads are running in parallel on every cluster.
>>
>> Why is this behaviour seen? What does this number 3 specify?
>> Is there any configuration parameter that I have to set if I want to run
>> more threads concurrently?
>>
>> Thanks
>> Harika
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Running multiple threads with same Spark Context

2015-02-24 Thread Harika
Hi all,

I have been running a simple SQL program on Spark. To test the concurrency,
I have created 10 threads inside the program, all threads using same
SQLContext object. When I ran the program on my EC2 cluster using
spark-submit, only 3 threads were running in parallel. I have repeated the
test on different EC2 clusters (containing different number of cores) and
found out that only 3 threads are running in parallel on every cluster. 

Why is this behaviour seen? What does this number 3 specify?
Is there any configuration parameter that I have to set if I want to run
more threads concurrently?

Thanks
Harika



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HiveContext in SparkSQL - concurrency issues

2015-02-24 Thread Harika
Hi Sreeharsha,

My data is in HDFS. I am trying to use Spark HiveContext (instead of
SQLContext) to fire queries on my data just because HiveContext supports
more operations.


Sreeharsha wrote
> Change derby to mysql and check once me to faced the same issue

I am pretty new to Spark and Hive. I do not know how to change from Derby to
MySQL. The log which I posted is when I simply changed from SQLContext to
HiveContext. Do I have to change any property inorder to point HiveContext
to use MySQL instead of Derby?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-in-SparkSQL-concurrency-issues-tp21491p21783.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HiveContext in SparkSQL - concurrency issues

2015-02-12 Thread Harika
Hi,

I've been reading about Spark SQL and people suggest that using HiveContext
is better. So can anyone please suggest a solution to the above problem.
This is stopping me from moving forward with HiveContext.

Thanks
Harika



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-in-SparkSQL-concurrency-issues-tp21491p21636.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to efficiently utilize all cores?

2015-02-11 Thread Harika
Hi Aplysia,

Thanks for the reply.

Could you be more specific in terms of what part of the document to look at
as I have already seen it and tried a few of the relevant settings for no
use. 







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-utilize-all-cores-tp21569p21597.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org