Re: Running spark with javaagent configuration

2019-05-15 Thread Akshay Bhardwaj
Hi Anton, Do you have the option of storing the JAR file on HDFS, which can be accessed via spark in your cluster? Akshay Bhardwaj +91-97111-33849 On Thu, May 16, 2019 at 12:04 AM Oleg Mazurov wrote: > You can see what Uber JVM does at > https://github.com/uber-common/jvm-profiler : > >

Re: Spark job gets hung on cloudera cluster

2019-05-15 Thread Akshay Bhardwaj
Hi Rishi, Are you running spark on YARN or spark's master-slave cluster? Akshay Bhardwaj +91-97111-33849 On Thu, May 16, 2019 at 7:15 AM Rishi Shah wrote: > Any one please? > > On Tue, May 14, 2019 at 11:51 PM Rishi Shah > wrote: > >> Hi All, >> >> At times when there's a data node failure,

how to get spark-sql lineage

2019-05-15 Thread lk_spark
hi,all: When I use spark , if I run some SQL to do ETL how can I get lineage info. I found that , CDH spark have some config about lineage : spark.lineage.enabled=true spark.lineage.log.dir=/var/log/spark2/lineage Are they also work for apache spark ? 2019-05-16

Re: Databricks - number of executors, shuffle.partitions etc

2019-05-15 Thread ayan guha
Well its a databricks question so better be asked in their forum. You can set up cluster level params when you create new cluster or add them later. Go to cluster page, ipen one cluster, expand additional config section and add your param there as key value pair separated by space. On Thu, 16

Re: Databricks - number of executors, shuffle.partitions etc

2019-05-15 Thread Rishi Shah
Hi All, Any idea? Thanks, -Rishi On Tue, May 14, 2019 at 11:52 PM Rishi Shah wrote: > Hi All, > > How can we set spark conf parameter in databricks notebook? My cluster > doesn't take into account any spark.conf.set properties... it creates 8 > worker nodes (dat executors) but doesn't honor

Re: Spark job gets hung on cloudera cluster

2019-05-15 Thread Rishi Shah
Any one please? On Tue, May 14, 2019 at 11:51 PM Rishi Shah wrote: > Hi All, > > At times when there's a data node failure, running spark job doesn't fail > - it gets stuck and doesn't return. Any setting can help here? I would > ideally like to get the job terminated or executors running on

Re: Are Spark Dataframes mutable in Structured Streaming?

2019-05-15 Thread Russell Spitzer
Dataframes describe the calculation to be done, but the underlying implementation is an "Incremental Query". That is that the dataframe code is executed repeatedly with Catalyst adjusting the final execution plan on each run. Some parts of the plan refer to static pieces of data, others refer to

Re: Running spark with javaagent configuration

2019-05-15 Thread Oleg Mazurov
You can see what Uber JVM does at https://github.com/uber-common/jvm-profiler : --conf spark.jars=hdfs://hdfs_url/lib/jvm-profiler-1.0.0.jar > --conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar -- Oleg On Wed, May 15, 2019 at 6:28 AM Anton Puzanov wrote: > Hi

Are Spark Dataframes mutable in Structured Streaming?

2019-05-15 Thread Sheel Pancholi
Hi Structured Streaming treats a stream as an unbounded table in the form of a DataFrame. Continuously flowing data from the stream keeps getting added to this DataFrame (which is the unbounded table) which warrants a change to the DataFrame which violates the vary basic nature of a DataFrame

Re: Why do we need Java-Friendly APIs in Spark ?

2019-05-15 Thread Jason Nerothin
I did a quick google search for "Java/Scala interoperability" and was surprised to find very few recent results on the topic. (Has the world given up?) It's easy to use Java library code from Scala, but the opposite is not true. I would think about the problem this way: Do *YOU* need to provide

Running spark with javaagent configuration

2019-05-15 Thread Anton Puzanov
Hi everyone, I want to run my spark application with javaagent, specifically I want to use newrelic with my application. When I run spark-submit I must pass --conf "spark.driver.extraJavaOptions=-javaagent=" My problem is that I can't specify the full path as I run in cluster mode and I don't

Re: Why do we need Java-Friendly APIs in Spark ?

2019-05-15 Thread Jean-Georges Perrin
I see… Did you consider Structure Streaming? Otherwise, you could create a factory that will build your higher level object, that will return an interface defining your API, but the implementation may vary based on the context. And English is not my native language as well... Jean -Georges

Re: Streaming job, catch exceptions

2019-05-15 Thread bsikander
Any help would be much appreciated. The error and question is quite generic, i believe that most experienced users will be able to answer. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Re: Spark sql insert hive table which method has the highest performance

2019-05-15 Thread Jelly Young
Hi, The document of DFWriter say that: Unlike `saveAsTable`, `insertInto` ignores the column names and just uses position-based For example: * * {{{ *scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1") *scala> Seq((3, 4)).toDF("j", "i").write.insertInto("t1") *

Spark sql insert hive table which method has the highest performance

2019-05-15 Thread 车 ��
Hello guys, I use spark streaming to receive data from kafka and need to store the data into hive. I see the following ways to insert data into hive on the Internet: 1.use tmp_table TmpDF=spark.createDataFrame(RDD,schema) TmpDF.createOrReplaceTempView('TmpData')