date:20190719

Re: Spark SaveMode

2019-07-19 Thread Mich Talebzadeh

This behaviour is governed by the underlying RDBMS for bulk insert, where it either commits or roll backs. You can insert new rows into an staging table in Oracle (which is common in ETL) and then insert/select into Oracle table in shell routine. The other way is to use JDBC in Spark to read Orac

Re: Spark SaveMode

2019-07-19 Thread Jörn Franke

This is not an issue of Spark, but the underlying database. The primary key constraint has a purpose and ignoring it would defeat that purpose. Then to handle your use case, you would need to make multiple decisions that may imply you don’t want to simply insert if not exist. Maybe you want to d

Spark SaveMode

2019-07-19 Thread Richard

Any reason why Spark's SaveMode doesn't have mode that ignore any Primary Key/Unique constraint violations? Let's say I'm using spark to migrate some data from Cassandra to Oracle, I want the insert operation to be "ignore if exist primary keys" instead of failing the whole batch. Thanks, Richard

Re: Spark dataset to explode json string

2019-07-19 Thread Richard

ok, thanks, I have another way that is currently working but not efficient if I have to extract lot of fields that is creating udf for each extraction: df = df.withColumn("foo", getfoo.apply(col("jsonCol"))) .withColumn("bar", getbar.apply(col("jsonCol"))); On Fri, Jul 19, 2019 at 8:54 PM Mic

Re: Spark dataset to explode json string

2019-07-19 Thread Mich Talebzadeh

You can try to split the {"foo": "val1", "bar": "val2"} as below. /* This is an example of output! (c1003d93-5157-4092-86cf-0607157291d8,{"rowkey":"c1003d93-5157-4092-86cf-0607157291d8","ticker":"TSCO", "timeissued":"2019-07-01T09:10:55", "price":395.25}) {"rowkey":"c1003d93-5157-4092-86cf-060715

Re: Spark and Oozie

2019-07-19 Thread William Shen

Dennis, do you know what’s taking the additional time? Is it the Spark Job, or oozie waiting for allocation from YARN? Do you have resource contention issue in YARN? On Fri, Jul 19, 2019 at 12:24 AM Bartek Dobija wrote: > Hi Dennis, > > Oozie jobs shouldn't take that long in a well configured cl

Re: Spark dataset to explode json string

2019-07-19 Thread Richard

example of jsonCol (String): {"foo": "val1", "bar": "val2"} Thanks, On Fri, Jul 19, 2019 at 3:57 PM Mich Talebzadeh wrote: > Sure. > > Do you have an example of a record from Cassandra read into df by any > chance? Only columns that need to go into Oracle. > > df.select('col1, 'col2, 'jsonCol).

Re: Spark dataset to explode json string

2019-07-19 Thread Mich Talebzadeh

Sure. Do you have an example of a record from Cassandra read into df by any chance? Only columns that need to go into Oracle. df.select('col1, 'col2, 'jsonCol).take(1).foreach(println) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd

Re: Spark dataset to explode json string

2019-07-19 Thread Richard

Thanks for the reply, my situation is little different than your sample: Following is the schema from source (df.printSchema();) root |-- id: string (nullable = true) |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- jsonCol: string (nullable = true) I want extract mul

Re: Spark dataset to explode json string

2019-07-19 Thread Mich Talebzadeh

Hi Richard, You can use the following to read JSON data into DF. The example is reading JSON from Kafka topic val sc = spark.sparkContext import spark.implicits._ // Use map to create the new RDD using the value portion of the pair. val jsonRDD = pricesRDD.map

Spark dataset to explode json string

2019-07-19 Thread Richard

let's say I use spark to migrate some data from Cassandra table to Oracle table Cassandra Table: CREATE TABLE SOURCE( id UUID PRIMARY KEY, col1 text, col2 text, jsonCol text ); example jsonCol value: {"foo": "val1", "bar", "val2"} I am trying to extract fields from the json column while importing

Spark ImportError: No module named XXX

2019-07-19 Thread zenglong chen

Hi,all: aused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark-2.4.3/python/lib/pyspark.zip/pyspark/worker.py", line 364, in main func, profiler, deserializer, serializer = read_command(pickleSer, infile) File "/home/ubuntu/spa

Unsubscribe

2019-07-19 Thread Aslan Bakirov

Re: Spark and Oozie

2019-07-19 Thread Bartek Dobija

Hi Dennis, Oozie jobs shouldn't take that long in a well configured cluster. Oozie allocates it's own resources in Yarn which may require fine tuning. Check if YARN gives resources to the Oozie job immediately which may be one of the reasons and change jobs priorities in YARN scheduling configurat

Spark and Oozie

2019-07-19 Thread Dennis Suhari

Dear experts, I am using Spark for processing data from HDFS (hadoop). These Spark application are data pipelines, data wrangling and machine learning applications. Thus Spark submits its job using YARN. This also works well. For scheduling I am now trying to use Apache Oozie, but I am facin

Re: Spark SaveMode

Re: Spark SaveMode

Spark SaveMode

Re: Spark dataset to explode json string

Re: Spark dataset to explode json string

Re: Spark and Oozie

Re: Spark dataset to explode json string

Re: Spark dataset to explode json string

Re: Spark dataset to explode json string

Re: Spark dataset to explode json string

Spark dataset to explode json string

Spark ImportError: No module named XXX

Unsubscribe

Re: Spark and Oozie

Spark and Oozie

15 matches

Site Navigation

Mail list logo

Footer information