subject:"\"Re\\\: Spark hive overwrite is very very slow\""

Re: Spark hive overwrite is very very slow

2017-08-20 Thread KhajaAsmath Mohammed

I tried all the approaches. 1.Partitioned by year,month,day on hive table with parquet format when table is created in impala. 2. Dataset from hive is not partitioned. used insert overwrite hivePartitonedTable partition(year,month,day) select * from tempViewOFDataset . Also tried Dataset.write.mo

Re: Spark hive overwrite is very very slow

2017-08-20 Thread ayan guha

Just curious - is your dataset partitioned on your partition columns? On Mon, 21 Aug 2017 at 3:54 am, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > We are in cloudera CDH5.10 and we are using spark 2 that comes with > cloudera. > > Coming to second solution, creating a temporary view o

Re: Spark hive overwrite is very very slow

2017-08-20 Thread KhajaAsmath Mohammed

We are in cloudera CDH5.10 and we are using spark 2 that comes with cloudera. Coming to second solution, creating a temporary view on dataframe but it didnt improve my performance too. I do remember performance was very fast when doing whole overwrite table without partitons but the problem start

Re: Spark hive overwrite is very very slow

2017-08-20 Thread Jörn Franke

Ah i see then I would check also directly in Hive if you have issues to insert data in the Hive table. Alternatively you can try to register the df as temptable and do a insert into the Hive table from the temptable using Spark sql ("insert into table hivetable select * from temptable") You se

Re: Spark hive overwrite is very very slow

2017-08-20 Thread KhajaAsmath Mohammed

Hi, I have created hive table in impala first with storage format as parquet. With dataframe from spark I am tryinig to insert into the same table with below syntax. Table is partitoned by year,month,day ds.write.mode(SaveMode.Overwrite).insertInto("db.parqut_table") https://issues.apache.org/ji

Re: Spark hive overwrite is very very slow

2017-08-20 Thread Jörn Franke

Have you made sure that the saveastable stores them as parquet? > On 20. Aug 2017, at 18:07, KhajaAsmath Mohammed > wrote: > > we are using parquet tables, is it causing any performance issue? > >> On Sun, Aug 20, 2017 at 9:09 AM, Jörn Franke wrote: >> Improving the performance of Hive can be

Re: Spark hive overwrite is very very slow

2017-08-20 Thread KhajaAsmath Mohammed

we are using parquet tables, is it causing any performance issue? On Sun, Aug 20, 2017 at 9:09 AM, Jörn Franke wrote: > Improving the performance of Hive can be also done by switching to > Tez+llap as an engine. > Aside from this : you need to check what is the default format that it > writes to

Re: Spark hive overwrite is very very slow

2017-08-20 Thread Jörn Franke

Improving the performance of Hive can be also done by switching to Tez+llap as an engine. Aside from this : you need to check what is the default format that it writes to Hive. One issue for the slow storing into a hive table could be that it writes by default to csv/gzip or csv/bzip2 > On 20.

Re: Spark hive overwrite is very very slow

2017-08-20 Thread KhajaAsmath Mohammed

Yes we tried hive and want to migrate to spark for better performance. I am using paraquet tables . Still no better performance while loading. Sent from my iPhone > On Aug 20, 2017, at 2:24 AM, Jörn Franke wrote: > > Have you tried directly in Hive how the performance is? > > In which Forma

Re: Spark hive overwrite is very very slow

2017-08-20 Thread Jörn Franke

Have you tried directly in Hive how the performance is? In which Format do you expect Hive to write? Have you made sure it is in this format? It could be that you use an inefficient format (e.g. CSV + bzip2). > On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed > wrote: > > Hi, > > I have writ

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

Re: Spark hive overwrite is very very slow

10 matches

Site Navigation

Mail list logo

Footer information