LIVY VS Spark Job Server

2016-09-14 Thread SamyaMaiti
Hi Team, I am evaluating different ways to submit & monitor spark Jobs using REST Interfaces. When to use Livy vs Spark Job Server? Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LIVY-VS-Spark-Job-Server-tp27722.html Sent from the Apache

Re: spark.driver.extraJavaOptions

2016-07-21 Thread SamyaMaiti
Thanks for the reply RK. Using the first option, my application doesn't recognize spark.driver.extraJavaOptions. With the second option, the issue remains as same, 2016-07-21 12:59:41 ERROR SparkContext:95 - Error initializing SparkContext. org.apache.spark.SparkException: Found both

spark.driver.extraJavaOptions

2016-07-21 Thread SamyaMaiti
Hi Team, I am using *CDH 5.7.1* with spark *1.6.0* I have a spark streaming application that read s from kafka & do some processing. The issue is while starting the application in CLUSTER mode, i want to pass custom log4j.properies file to both driver & executor. *I have the below command :-*

Spark logging

2016-07-10 Thread SamyaMaiti
Hi Team, I have a spark application up & running on a 10 node Standalone cluster. When i launch the application in cluster mode i am able to create separate log file for driver & executors (common for all executors). But, my requirement is to create separate log file for each executors. Is it

Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread SamyaMaiti
Hi Team, Is there a way we can consume from Kafka using spark Streaming direct API using multiple consumers (belonging to same consumer group) Regards, Sam -- View this message in context:

Re: Spark Streaming and JMS

2015-12-02 Thread SamyaMaiti
Hi All, Is there any Pub-Sub for JMS provided by Spark out of box like Kafka? Thanks. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-JMS-tp5371p25548.html Sent from the Apache Spark User List mailing list archive at

Monitoring Spark Jobs

2015-06-07 Thread SamyaMaiti
Hi All, I have a Spark SQL application to fetch data from Hive, on top I have a akka layer to run multiple Queries in parallel. *Please suggest a mechanism, so as to figure out the number of spark jobs running in the cluster at a given instance of time. * I need to do the above as, I see the

Re: Spark Job execution time

2015-05-15 Thread SamyaMaiti
It does depend on the network IO within your cluster CPU usage. Said that the difference in time to run should not be huge (assumption, you are not running any other job in the cluster in parallel). -- View this message in context:

Hive partition table + read using hiveContext + spark 1.3.1

2015-05-14 Thread SamyaMaiti
Hi Team, I have a hive partition table with partition column having spaces. When I try to run any query, say a simple Select * from table_name, it fails. *Please note the same was working in spark 1.2.0, now I have upgraded to 1.3.1. Also there is no change in my application code base.* If I

Spark Vs MR

2015-04-04 Thread SamyaMaiti
How is spark faster than MR when data is in disk in both cases? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Vs-MR-tp22373.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: 4 seconds to count 13M lines. Does it make sense?

2015-04-04 Thread SamyaMaiti
Reduce *spark.sql.shuffle.partitions* from default of 200 to total number of cores. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/4-seconds-to-count-13M-lines-Does-it-make-sense-tp22360p22374.html Sent from the Apache Spark User List mailing list archive

persist(MEMORY_ONLY) takes lot of time

2015-04-01 Thread SamyaMaiti
Hi Experts, I have a parquet dataset of 550 MB ( 9 Blocks) in HDFS. I want to run SQL queries repetitively. Few questions : 1. When I do the below (persist to memory after reading from disk), it takes lot of time to persist to memory, any suggestions of how to tune this? val inputP

Parquet file + increase read parallelism

2015-03-23 Thread SamyaMaiti
Hi All, Suppose I have a parquet file of 100 MB in HDFS my HDFS block is 64MB, so I have 2 block of data. When I do, *sqlContext.parquetFile(path)* followed by an action , two tasks are stared on two partitions. My intend is to read this 2 blocks in more partitions to fully utilize my cluster

Writing to a single file from multiple executors

2015-03-11 Thread SamyaMaiti
Hi Experts, I have a scenario, where in I want to write to a avro file from a streaming job that reads data from kafka. But the issue is, as there are multiple executors and when all try to write to a given file I get a concurrent exception. I way to mitigate the issue is to repartition have a

save rdd to ORC file

2015-01-03 Thread SamyaMaiti
Hi Experts, Like saveAsParquetFile on schemaRDD, there is a equivalent to store in ORC file. I am using spark 1.2.0. As per the link below, looks like its not part of 1.2.0, so any latest update would be great. https://issues.apache.org/jira/browse/SPARK-2883 Till the next release, is there a

Kafka + Spark streaming

2014-12-30 Thread SamyaMaiti
Hi Experts, Few general Queries : 1. Can a single block/partition in a RDD have more than 1 kafka message? or there will be one only one kafka message per block? In a more broader way, is the message count related to block in any way or its just that any message received with in a particular

Re: ReliableDeliverySupervisor: Association with remote system

2014-12-29 Thread SamyaMaiti
Resolved. I changed to Apache Hadoop 2.4.0 Apache spark 1.2.0 combination, all works fine. Must be because the 1.2.0 version of spark was compiled with hadoop 2.4.0 -- View this message in context:

ReliableDeliverySupervisor: Association with remote system

2014-12-25 Thread SamyaMaiti
by samyamaiti on 12/25/14. */ object Driver { def main(args: Array[String]) { //CheckPoint dir in HDFS val checkpointDirectory = hdfs://localhost:8020/user/samyamaiti/SparkCheckpoint1 //functionToCreateContext def functionToCreateContext(): StreamingContext = { //Setting conf

Re: ReliableDeliverySupervisor: Association with remote system

2014-12-25 Thread SamyaMaiti
Sorry for the typo. Apache Hadoop version is 2.6.0 Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-with-remote-system-tp20859p20860.html Sent from the Apache Spark User List mailing list archive at