Hi Anton,
Do you have the option of storing the JAR file on HDFS, which can be
accessed via spark in your cluster?
Akshay Bhardwaj
+91-97111-33849
On Thu, May 16, 2019 at 12:04 AM Oleg Mazurov
wrote:
> You can see what Uber JVM does at
> https://github.com/uber-common/jvm-profiler :
>
>
Hi Rishi,
Are you running spark on YARN or spark's master-slave cluster?
Akshay Bhardwaj
+91-97111-33849
On Thu, May 16, 2019 at 7:15 AM Rishi Shah wrote:
> Any one please?
>
> On Tue, May 14, 2019 at 11:51 PM Rishi Shah
> wrote:
>
>> Hi All,
>>
>> At times when there's a data node failure,
hi,all:
When I use spark , if I run some SQL to do ETL how can I get lineage
info. I found that , CDH spark have some config about lineage :
spark.lineage.enabled=true
spark.lineage.log.dir=/var/log/spark2/lineage
Are they also work for apache spark ?
2019-05-16
Well its a databricks question so better be asked in their forum.
You can set up cluster level params when you create new cluster or add them
later. Go to cluster page, ipen one cluster, expand additional config
section and add your param there as key value pair separated by space.
On Thu, 16
Hi All,
Any idea?
Thanks,
-Rishi
On Tue, May 14, 2019 at 11:52 PM Rishi Shah
wrote:
> Hi All,
>
> How can we set spark conf parameter in databricks notebook? My cluster
> doesn't take into account any spark.conf.set properties... it creates 8
> worker nodes (dat executors) but doesn't honor
Any one please?
On Tue, May 14, 2019 at 11:51 PM Rishi Shah
wrote:
> Hi All,
>
> At times when there's a data node failure, running spark job doesn't fail
> - it gets stuck and doesn't return. Any setting can help here? I would
> ideally like to get the job terminated or executors running on
Dataframes describe the calculation to be done, but the underlying
implementation is an "Incremental Query". That is that the dataframe code
is executed repeatedly with Catalyst adjusting the final execution plan on
each run. Some parts of the plan refer to static pieces of data, others
refer to
You can see what Uber JVM does at
https://github.com/uber-common/jvm-profiler :
--conf spark.jars=hdfs://hdfs_url/lib/jvm-profiler-1.0.0.jar
> --conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar
-- Oleg
On Wed, May 15, 2019 at 6:28 AM Anton Puzanov
wrote:
> Hi
Hi
Structured Streaming treats a stream as an unbounded table in the form of a
DataFrame. Continuously flowing data from the stream keeps getting added to
this DataFrame (which is the unbounded table) which warrants a change to
the DataFrame which violates the vary basic nature of a DataFrame
I did a quick google search for "Java/Scala interoperability" and was
surprised to find very few recent results on the topic. (Has the world
given up?)
It's easy to use Java library code from Scala, but the opposite is not true.
I would think about the problem this way: Do *YOU* need to provide
Hi everyone,
I want to run my spark application with javaagent, specifically I want to
use newrelic with my application.
When I run spark-submit I must pass --conf
"spark.driver.extraJavaOptions=-javaagent="
My problem is that I can't specify the full path as I run in cluster mode
and I don't
I see… Did you consider Structure Streaming?
Otherwise, you could create a factory that will build your higher level object,
that will return an interface defining your API, but the implementation may
vary based on the context.
And English is not my native language as well...
Jean -Georges
Any help would be much appreciated.
The error and question is quite generic, i believe that most experienced
users will be able to answer.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe
Hi,
The document of DFWriter say that:
Unlike `saveAsTable`, `insertInto` ignores the column names and just uses
position-based
For example:
*
* {{{
*scala> Seq((1, 2)).toDF("i",
"j").write.mode("overwrite").saveAsTable("t1")
*scala> Seq((3, 4)).toDF("j", "i").write.insertInto("t1")
*
Hello guys,
I use spark streaming to receive data from kafka and need to store the data
into hive. I see the following ways to insert data into hive on the Internet:
1.use tmp_table
TmpDF=spark.createDataFrame(RDD,schema)
TmpDF.createOrReplaceTempView('TmpData')
15 matches
Mail list logo