This behaviour is governed by the underlying RDBMS for bulk insert, where
it either commits or roll backs.
You can insert new rows into an staging table in Oracle (which is common in
ETL) and then insert/select into Oracle table in shell routine.
The other way is to use JDBC in Spark to read Orac
This is not an issue of Spark, but the underlying database. The primary key
constraint has a purpose and ignoring it would defeat that purpose.
Then to handle your use case, you would need to make multiple decisions that
may imply you don’t want to simply insert if not exist. Maybe you want to d
Any reason why Spark's SaveMode doesn't have mode that ignore any Primary
Key/Unique constraint violations?
Let's say I'm using spark to migrate some data from Cassandra to Oracle, I
want the insert operation to be "ignore if exist primary keys" instead of
failing the whole batch.
Thanks,
Richard
ok, thanks,
I have another way that is currently working but not efficient if I have to
extract lot of fields
that is creating udf for each extraction:
df = df.withColumn("foo", getfoo.apply(col("jsonCol")))
.withColumn("bar", getbar.apply(col("jsonCol")));
On Fri, Jul 19, 2019 at 8:54 PM Mic
You can try to split the {"foo": "val1", "bar": "val2"} as below.
/*
This is an example of output!
(c1003d93-5157-4092-86cf-0607157291d8,{"rowkey":"c1003d93-5157-4092-86cf-0607157291d8","ticker":"TSCO",
"timeissued":"2019-07-01T09:10:55", "price":395.25})
{"rowkey":"c1003d93-5157-4092-86cf-060715
Dennis, do you know what’s taking the additional time? Is it the Spark Job,
or oozie waiting for allocation from YARN? Do you have resource contention
issue in YARN?
On Fri, Jul 19, 2019 at 12:24 AM Bartek Dobija
wrote:
> Hi Dennis,
>
> Oozie jobs shouldn't take that long in a well configured cl
example of jsonCol (String):
{"foo": "val1", "bar": "val2"}
Thanks,
On Fri, Jul 19, 2019 at 3:57 PM Mich Talebzadeh
wrote:
> Sure.
>
> Do you have an example of a record from Cassandra read into df by any
> chance? Only columns that need to go into Oracle.
>
> df.select('col1, 'col2, 'jsonCol).
Sure.
Do you have an example of a record from Cassandra read into df by any
chance? Only columns that need to go into Oracle.
df.select('col1, 'col2, 'jsonCol).take(1).foreach(println)
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
Thanks for the reply,
my situation is little different than your sample:
Following is the schema from source (df.printSchema();)
root
|-- id: string (nullable = true)
|-- col1: string (nullable = true)
|-- col2: string (nullable = true)
|-- jsonCol: string (nullable = true)
I want extract mul
Hi Richard,
You can use the following to read JSON data into DF. The example is reading
JSON from Kafka topic
val sc = spark.sparkContext
import spark.implicits._
// Use map to create the new RDD using the value portion of the
pair.
val jsonRDD = pricesRDD.map
let's say I use spark to migrate some data from Cassandra table to Oracle
table
Cassandra Table:
CREATE TABLE SOURCE(
id UUID PRIMARY KEY,
col1 text,
col2 text,
jsonCol text
);
example jsonCol value: {"foo": "val1", "bar", "val2"}
I am trying to extract fields from the json column while importing
Hi,all:
aused by: org.apache.spark.api.python.PythonException: Traceback (most
recent call last):
File "/home/ubuntu/spark-2.4.3/python/lib/pyspark.zip/pyspark/worker.py",
line 364, in main
func, profiler, deserializer, serializer = read_command(pickleSer,
infile)
File "/home/ubuntu/spa
Hi Dennis,
Oozie jobs shouldn't take that long in a well configured cluster. Oozie
allocates it's own resources in Yarn which may require fine tuning. Check
if YARN gives resources to the Oozie job immediately which may be one of
the reasons and change jobs priorities in YARN scheduling configurat
Dear experts,
I am using Spark for processing data from HDFS (hadoop). These Spark
application are data pipelines, data wrangling and machine learning
applications. Thus Spark submits its job using YARN.
This also works well. For scheduling I am now trying to use Apache Oozie, but I
am facin
15 matches
Mail list logo