Hi Team,
I am evaluating different ways to submit & monitor spark Jobs using REST
Interfaces.
When to use Livy vs Spark Job Server?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/LIVY-VS-Spark-Job-Server-tp27722.html
Sent from the Apache
Thanks for the reply RK.
Using the first option, my application doesn't recognize
spark.driver.extraJavaOptions.
With the second option, the issue remains as same,
2016-07-21 12:59:41 ERROR SparkContext:95 - Error initializing SparkContext.
org.apache.spark.SparkException: Found both
Hi Team,
I am using *CDH 5.7.1* with spark *1.6.0*
I have a spark streaming application that read s from kafka & do some
processing.
The issue is while starting the application in CLUSTER mode, i want to pass
custom log4j.properies file to both driver & executor.
*I have the below command :-*
Hi Team,
I have a spark application up & running on a 10 node Standalone cluster.
When i launch the application in cluster mode i am able to create separate
log file for driver & executors (common for all executors).
But, my requirement is to create separate log file for each executors. Is it
Hi Team,
Is there a way we can consume from Kafka using spark Streaming direct API
using multiple consumers (belonging to same consumer group)
Regards,
Sam
--
View this message in context:
Hi All,
Is there any Pub-Sub for JMS provided by Spark out of box like Kafka?
Thanks.
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-JMS-tp5371p25548.html
Sent from the Apache Spark User List mailing list archive at
Hi All,
I have a Spark SQL application to fetch data from Hive, on top I have a akka
layer to run multiple Queries in parallel.
*Please suggest a mechanism, so as to figure out the number of spark jobs
running in the cluster at a given instance of time. *
I need to do the above as, I see the
It does depend on the network IO within your cluster CPU usage. Said that
the difference in time to run should not be huge (assumption, you are not
running any other job in the cluster in parallel).
--
View this message in context:
Hi Team,
I have a hive partition table with partition column having spaces.
When I try to run any query, say a simple Select * from table_name, it
fails.
*Please note the same was working in spark 1.2.0, now I have upgraded to
1.3.1. Also there is no change in my application code base.*
If I
How is spark faster than MR when data is in disk in both cases?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Vs-MR-tp22373.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reduce *spark.sql.shuffle.partitions* from default of 200 to total number of
cores.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/4-seconds-to-count-13M-lines-Does-it-make-sense-tp22360p22374.html
Sent from the Apache Spark User List mailing list archive
Hi Experts,
I have a parquet dataset of 550 MB ( 9 Blocks) in HDFS. I want to run SQL
queries repetitively.
Few questions :
1. When I do the below (persist to memory after reading from disk), it takes
lot of time to persist to memory, any suggestions of how to tune this?
val inputP
Hi All,
Suppose I have a parquet file of 100 MB in HDFS my HDFS block is 64MB, so
I have 2 block of data.
When I do, *sqlContext.parquetFile(path)* followed by an action , two
tasks are stared on two partitions.
My intend is to read this 2 blocks in more partitions to fully utilize my
cluster
Hi Experts,
I have a scenario, where in I want to write to a avro file from a streaming
job that reads data from kafka.
But the issue is, as there are multiple executors and when all try to write
to a given file I get a concurrent exception.
I way to mitigate the issue is to repartition have a
Hi Experts,
Like saveAsParquetFile on schemaRDD, there is a equivalent to store in ORC
file.
I am using spark 1.2.0.
As per the link below, looks like its not part of 1.2.0, so any latest
update would be great.
https://issues.apache.org/jira/browse/SPARK-2883
Till the next release, is there a
Hi Experts,
Few general Queries :
1. Can a single block/partition in a RDD have more than 1 kafka message? or
there will be one only one kafka message per block? In a more broader way,
is the message count related to block in any way or its just that any
message received with in a particular
Resolved.
I changed to Apache Hadoop 2.4.0 Apache spark 1.2.0 combination, all works
fine.
Must be because the 1.2.0 version of spark was compiled with hadoop 2.4.0
--
View this message in context:
by samyamaiti on 12/25/14.
*/
object Driver {
def main(args: Array[String]) {
//CheckPoint dir in HDFS
val checkpointDirectory =
hdfs://localhost:8020/user/samyamaiti/SparkCheckpoint1
//functionToCreateContext
def functionToCreateContext(): StreamingContext = {
//Setting conf
Sorry for the typo.
Apache Hadoop version is 2.6.0
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-with-remote-system-tp20859p20860.html
Sent from the Apache Spark User List mailing list archive at
19 matches
Mail list logo