Delta Logic in Spark

2018-11-17 Thread Mahender Sarangam
Hi, We have daily data pull which pulls almost 50 GB of data from upstream system. We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data into Hive Target table and Now we are copying whole hive target table to SQL esp. SQL Staging Table & implement merge from staging

Re: Unable to read multiple JSON.Gz File.

2018-10-18 Thread Mahender Sarangam
for a folder containing multiple gz files. From: Mahender Sarangam <mailto:mahender.bigd...@outlook.com> Sent: Monday, October 1, 2018 2:00 AM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Unable to read multiple JSON.Gz File. I’m trying to read multiple .json.g

Unable to read multiple JSON.Gz File.

2018-10-01 Thread Mahender Sarangam
I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe. I’ve even tried giving *.gz but

Internal table stored NULL as \N. How to remove it

2018-06-23 Thread Mahender Sarangam
Hi, We are storing our final transformed data in Hive table in JSON format. while storing data into table, all the null fields are converted into \\N. while reading table, we are seeing \\N instead of NULL. We tried setting ALTER TABLE sample set SERDEPROPERTIES ('serialization.null.format' =

Building Datwarehouse Application in Spark

2018-04-04 Thread Mahender Sarangam
Hi, Does anyone has good architecture document/design principle for building warehouse application using Spark. Is it better way of having Hive Context created with HQL and perform transformation or Directly loading files in dataframe and perform data transformation. We need to implement SCD

Dynamic Key JSON Parsing

2018-03-18 Thread Mahender Sarangam
Hi, I'm new to Spark and Scala, need help on transforming Nested JSON using Scala. We have upstream returning JSON like { "id": 100, "text": "Hello, world." Users : [ "User1": { "name": "Brett", "id": 200, "Type" : "Employee" "empid":"2" },

Need help

2017-10-10 Thread Mahender Sarangam
Hi, I'm new to spark and big data, we are doing some poc and building our warehouse application using Spark. Can any one share with me guidance like Naming Convention for HDFS Name,Table Names, UDF and DB Name. Any sample architecture diagram. -Mahens

Support of Theta Join

2017-01-12 Thread Mahender Sarangam
Hi All, Is there any support of theta join in SPARK. We want to identify the country based on range on IP Address (we have in our DB) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Any equivalent method lateral and explore

2016-11-22 Thread Mahender Sarangam
Hi, We are converting our hive logic which is using lateral view and explode functions. Is there any builtin function in scala for performing lateral view explore. Below is our query in Hive. temparray is temp table with c0 and c1 columns SELECT id, CONCAT_WS(',', collect_list(LineID)) as

Re: Any Dynamic Compilation of Scala Query

2016-11-08 Thread Mahender Sarangam
ng such problems as mentioned below. A sample example would help to understand the problem. Regards, Kiran From: Mahender Sarangam <mahender.bigd...@outlook.com><mailto:mahender.bigd...@outlook.com> Date: Wednesday, October 26, 2016 at 2:05 PM To: user <user@spark.apache.org><mailt

Any Dynamic Compilation of Scala Query

2016-10-26 Thread Mahender Sarangam
Hi, Is there any way to dynamically execute a string which has scala code against spark engine. We are dynamically creating scala file, we would like to submit this scala file to Spark, but currently spark accepts only JAR file has input from Remote Job submission. Is there any other way to

Re: HIVE Query 25x faster than SPARK Query

2016-06-15 Thread Mahender Sarangam
+1, Even see performance degradation while comparing SPark SQL with Hive. We have table of 260 columns. We have executed in hive and SPARK. In Hive, it is taking 66 sec for 1 gb of data whereas in Spark, it is taking 4 mins of time. On 6/9/2016 3:19 PM, Gavin Yue wrote: Could you print out the

how to store results of Scala Query in Text format or tab delimiter

2016-06-09 Thread Mahender Sarangam
Hi, We are newbies learning spark. We are running Scala query against our Parquet table. Whenever we fire query in Jupyter, results are shown in page, Only part of results are shown in UI. So we are trying to store the results into table which is Parquet format. By default, In Spark all the