Re: Guide Step by step Stark streaming

2016-09-15 Thread rahulkumar-aws
Really your project is very nice and you can simulate various domain use case
with this, as this is your college project I can't help you in coding but I
can share a presentation  https://goo.gl/XUJd3b    of
mine that will give you a graphical diagram of  a real-life system that
builds on Apache Spark streaming.




-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Guide-Step-by-step-Stark-streaming-tp27731p27732.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark job within Web application

2016-09-15 Thread rahulkumar-aws
Hi, As I see your code it looks like you are trying to call spark code inside
your servlet, but my perspective try to make the separate system and do
communication using Thrift or Protobuf libraries. I build various
distributed web app using  Play Framework, Akka, Spray and jersey it is
working very well. 
/"This answer will not solve your problem but it will help you to pick a
good design for Web application with Apache Spark."/






 



-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-within-Web-application-tp27726p27727.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: How to give name to Spark jobs shown in Spark UI

2016-07-26 Thread rahulkumar-aws
You can set name in SparkConf() or if You are using Spark submit set --name
flag

*val sparkconf = new SparkConf()*
* .setMaster("local[4]")*
* .setAppName("saveFileJob")*
*val sc = new SparkContext(sparkconf)*


or spark-submit :

*./bin/spark-submit --name "FileSaveJob" --master local[4]  fileSaver.jar*




On Mon, Jul 25, 2016 at 9:46 PM, neil90 [via Apache Spark User List] <
ml-node+s1001560n27406...@n3.nabble.com> wrote:

> As far as I know you can give a name to the SparkContext. I recommend
> using a cluster monitoring tool like Ganglia to determine were its slow in
> your spark jobs.
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27406.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> 
> .
> NAML
> 
>




-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27414.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to run multiple Spark jobs as a workflow that takes input from a Streaming job in Oozie

2015-12-20 Thread rahulkumar-aws
Use Spark job server https://github.com/spark-jobserver/spark-jobserver

Additional: 

1. You can also write your on job server with spray (a Scala REST
framework).
2. Create Thrift server and pass states of each job states (Thrift Object )
between your different  Jobs.




-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-run-multiple-Spark-jobs-as-a-workflow-that-takes-input-from-a-Streaming-job-in-Oozie-tp25739p25745.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Header in each output files.

2015-06-19 Thread rahulkumar-aws
Just check this stackoverflow link may it help
http://stackoverflow.com/questions/26157456/add-a-header-before-text-file-on-save-in-spark
http://stackoverflow.com/questions/26157456/add-a-header-before-text-file-on-save-in-spark
  



-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Header-in-each-output-files-tp23379p23405.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error when connecting to Spark SQL via Hive JDBC driver

2015-06-19 Thread rahulkumar-aws
it look's like your spark-Hive jars are not compatible with Spark , compile
spark source with hive 13 flag.

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver
-DskipTests clean package

it will solve ur problem.



-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-connecting-to-Spark-SQL-via-Hive-JDBC-driver-tp23397p23404.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark application in production without HDFS

2015-06-15 Thread rahulkumar-aws
Hi If your data is not so huge you can use both cloudera and HDP's free
stack. Cloudera Express is 100% opensource free. 



-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-application-in-production-without-HDFS-tp23260p23322.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Limit Spark Shuffle Disk Usage

2015-06-15 Thread rahulkumar-aws
Check this link 
https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html
https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html
  

Hope this will solve your problem.



-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23323.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: EC2 Having script run at startup

2015-03-25 Thread rahulkumar-aws
You can use AWS user-data feature.
try this, if it help for you.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html  



-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/EC2-Having-script-run-at-startup-tp22197p4.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Not able to run spark job from code on EC2 with spark 1.2.0

2015-01-17 Thread rahulkumar-aws
Hi I am trying to run simple count on a s3 bucket, but with spark 1.2.0
version on EC2 it is not able to run.
I started my cluster using ec2 script that came with spark 1.2.0. 

some part of code : 




  It is working with spark 1.1.1 , but  not with 1.2.0  




-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-run-spark-job-from-code-on-EC2-with-spark-1-2-0-tp21217.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming in Production

2014-12-12 Thread rahulkumar-aws
Run Spark Cluster managed my Apache Mesos. Mesos can run in high-availability
mode, in which multiple Mesos masters run simultaneously.



-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20651.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Access to s3 from spark

2014-12-12 Thread rahulkumar-aws
Try Following any one  :
*1. Set the access key and secret key in the sparkContext:*
sparkContext.set(AWS_ACCESS_KEY_ID,yourAccessKey)
sparkContext.set(AWS_SECRET_ACCESS_KEY,yourSecretKey)

*2. Set the access key and secret key in the environment before starting
your application:*

export AWS_ACCESS_KEY_ID=your access
export AWS_SECRET_ACCESS_KEY=your secret​

* 3. Set the access key and secret key inside the hadoop configurations*
val hadoopConf=sparkContext.hadoopConfiguration;
hadoopConf.set(fs.s3.impl,org.apache.hadoop.fs.s3native.NativeS3FileSystem)
hadoopConf.set(fs.s3.awsAccessKeyId,yourAccessKey)
hadoopConf.set(fs.s3.awsSecretAccessKey,yourSecretKey)



-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Access-to-s3-from-spark-tp20631p20654.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Remote jar file

2014-12-11 Thread rahulkumar-aws
Put Jar file in site HDFS, URL must be globally visible inside of your
cluster, for instance, an hdfs:// path or a file:// path that is present on
all nodes. 



-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-tp20649p20650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org