how to get assertDataFrameEquals ignore nullable

2017-05-05 Thread A Shaikh
As part of TDD I am using com.holdenkarau.spark.testing.DatasetSuiteBase to assert if 2 Dataframes values are equal using assertDataFrameEquals(dataframe1, dataframe2) Although the values are same but it fails assertion because nullable property does not match for some column. Is there are way t

how to get assertDataFrameEquals ignore nullable

2017-05-05 Thread A Shaikh
As part of TDD I am using com.holdenkarau.spark.testing.DatasetSuiteBase to assert if 2 Dataframes values are equal using assertDataFrameEquals(dataframe1, dataframe2) Although the values are same but it fails assertion because nullable property does not match for some column. Is there are way t

DML in Spark ETL

2017-01-26 Thread A Shaikh
In past we used ETL tool wherein ETL task which update, insert and delete rows in target database tables(Oracle/Netezza). Sparks Dataset (and RDD) has .save* method which can insert rows. How to delete or update records in database table in Spark?

Re: Saving from Dataset to Bigquery Table

2017-01-20 Thread A Shaikh
is only on pairdd > > On 20 Jan 2017, at 11:54, A Shaikh wrote: > > Has anyone experience saving Dataset to Bigquery Table? > > I am loading into BigQuery using the following example > <https://cloud.google.com/hadoop/examples/bigquery-connector-spark-examp

Saving from Dataset to Bigquery Table

2017-01-20 Thread A Shaikh
Has anyone experience saving Dataset to Bigquery Table? I am loading into BigQuery using the following example sucessfully. This uses RDD.saveAsNewAPIHadoopDataset method to save data. I am using Dataset(or DataFrame) and l

Re: TDD in Spark

2017-01-20 Thread A Shaikh
at.com > https://twitter.com/lalleal > +46 70 7687109 > Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com > > > On Sun, Jan 15, 2017 at 7:14 PM, A Shaikh wrote: > > Whats the most popular Testing approach for Spark App. I am looking > > something in the line of TDD. >

TDD in Spark

2017-01-15 Thread A Shaikh
Whats the most popular Testing approach for Spark App. I am looking something in the line of TDD.

Re: Running a spark code using submit job in google cloud platform

2017-01-12 Thread A Shaikh
You may have tested this code on Spark version on your local machine version of which may be different to whats in Google Cloud Storage. You need to select appropraite Spark version when you submit your job. On 12 January 2017 at 15:51, Anahita Talebi wrote: > Dear all, > > I am trying to run a

Re: Handling null in dataset

2017-01-11 Thread A Shaikh
I tried DataFrame option below, not sure what that is for but doesnt seems to work. - nullValue: specifies a string that indicates a null value, nulls in the DataFrame will be written as this string. On 11 January 2017 at 17:11, A Shaikh wrote: > > > How does Spark handle nu

Handling null in dataset

2017-01-11 Thread A Shaikh
How does Spark handle null values. case class AvroSource(name: String, age: Integer, sal: Long, col_float: Float, col_double: Double, col_bytes: String, col_bool: Boolean ) val userDS = spark.read.format("com.databricks.spark.avro").option("nullValue", "x").load("./users.avro")//.as[AvroSour

Dataset Type safety

2017-01-10 Thread A Shaikh
I have a simple people.csv and following SimpleApp people.csv -- name,age abc,22 xyz,32 Working Code Object SimpleApp {} case class Person(name: String, age: Long) def main(args: Array[String]): Unit = { val spark

Re: Spark Read from Google store and save in AWS s3

2017-01-10 Thread A Shaikh
This should help https://cloud.google.com/hadoop/examples/bigquery-connector-spark-example On 8 January 2017 at 03:49, neil90 wrote: > Here is how you would read from Google Cloud Storage(note you need to > create > a service account key) -> > > os.environ['PYSPARK_SUBMIT_ARGS'] = """--jars > /h

Custom delimiter file load

2016-12-31 Thread A Shaikh
In Pyspark 2 loading file wtih any delimiter into Dataframe is pretty straightforward spark.read.csv(file, schema=, sep='|') Is there something similar in Spark 2 in Scala! spark.read.csv(path, sep='|')?

Re: Spark SQL Syntax

2016-12-19 Thread A Shaikh
6 at 14:00, Ramesh Krishnan wrote: > What is the version of spark you are using . If it is less than 2.0 , > consider using dataset API's to validate compile time checks on syntax. > > Thanks, > Ramesh > > On Mon, Dec 19, 2016 at 6:36 PM, A Shaikh wrote: > >> HI,

Spark SQL Syntax

2016-12-19 Thread A Shaikh
HI, I keep getting Spark SQL Syntax invalid especially for Dates/Timestamps manipulation. What's the best way to test SQL Syntax in Spark Dataframe is valid? Any online site for test or run a demo SQL! Thanks, Afzal