Hi,
We have daily data pull which pulls almost 50 GB of data from upstream system.
We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data
into Hive Target table and Now we are copying whole hive target table to SQL
esp. SQL Staging Table & implement merge from staging
for a folder containing
multiple gz files.
From: Mahender Sarangam
<mailto:mahender.bigd...@outlook.com>
Sent: Monday, October 1, 2018 2:00 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Unable to read multiple JSON.Gz File.
I’m trying to read multiple .json.g
I’m trying to read multiple .json.gz files from a Blob storage path using the
below scala code. But I’m unable to read the data from the files or print the
schema. If the files are not compressed as .gz then we are able to read all the
files into the Dataframe.
I’ve even tried giving *.gz but
Hi,
We are storing our final transformed data in Hive table in JSON format. while
storing data into table, all the null fields are converted into \\N. while
reading table, we are seeing \\N instead of NULL. We tried setting
ALTER TABLE sample set SERDEPROPERTIES ('serialization.null.format' =
Hi,
Does anyone has good architecture document/design principle for building
warehouse application using Spark.
Is it better way of having Hive Context created with HQL and perform
transformation or Directly loading files in dataframe and perform data
transformation.
We need to implement SCD
Hi,
I'm new to Spark and Scala, need help on transforming Nested JSON using Scala.
We have upstream returning JSON like
{
"id": 100,
"text": "Hello, world."
Users : [ "User1": {
"name": "Brett",
"id": 200,
"Type" : "Employee"
"empid":"2"
},
Hi,
I'm new to spark and big data, we are doing some poc and building our
warehouse application using Spark. Can any one share with me guidance
like Naming Convention for HDFS Name,Table Names, UDF and DB Name. Any
sample architecture diagram.
-Mahens
Hi All,
Is there any support of theta join in SPARK. We want to identify the
country based on range on IP Address (we have in our DB)
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
We are converting our hive logic which is using lateral view and explode
functions. Is there any builtin function in scala for performing lateral
view explore.
Below is our query in Hive. temparray is temp table with c0 and c1 columns
SELECT id, CONCAT_WS(',', collect_list(LineID)) as
ng such problems as
mentioned below.
A sample example would help to understand the problem.
Regards,
Kiran
From: Mahender Sarangam
<mahender.bigd...@outlook.com><mailto:mahender.bigd...@outlook.com>
Date: Wednesday, October 26, 2016 at 2:05 PM
To: user <user@spark.apache.org><mailt
Hi,
Is there any way to dynamically execute a string which has scala code
against spark engine. We are dynamically creating scala file, we would
like to submit this scala file to Spark, but currently spark accepts
only JAR file has input from Remote Job submission. Is there any other
way to
+1,
Even see performance degradation while comparing SPark SQL with Hive.
We have table of 260 columns. We have executed in hive and SPARK. In Hive, it
is taking 66 sec for 1 gb of data whereas in Spark, it is taking 4 mins of time.
On 6/9/2016 3:19 PM, Gavin Yue wrote:
Could you print out the
Hi,
We are newbies learning spark. We are running Scala query against our
Parquet table. Whenever we fire query in Jupyter, results are shown in
page, Only part of results are shown in UI. So we are trying to store
the results into table which is Parquet format. By default, In Spark all
the
13 matches
Mail list logo