I tried all the approaches.
1.Partitioned by year,month,day on hive table with parquet format when
table is created in impala.
2. Dataset from hive is not partitioned. used insert overwrite
hivePartitonedTable partition(year,month,day) select * from
tempViewOFDataset . Also tried
Dataset.write.mo
Just curious - is your dataset partitioned on your partition columns?
On Mon, 21 Aug 2017 at 3:54 am, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> We are in cloudera CDH5.10 and we are using spark 2 that comes with
> cloudera.
>
> Coming to second solution, creating a temporary view o
We are in cloudera CDH5.10 and we are using spark 2 that comes with
cloudera.
Coming to second solution, creating a temporary view on dataframe but it
didnt improve my performance too.
I do remember performance was very fast when doing whole overwrite table
without partitons but the problem start
Ah i see then I would check also directly in Hive if you have issues to insert
data in the Hive table. Alternatively you can try to register the df as
temptable and do a insert into the Hive table from the temptable using Spark
sql ("insert into table hivetable select * from temptable")
You se
Hi,
I have created hive table in impala first with storage format as parquet.
With dataframe from spark I am tryinig to insert into the same table with
below syntax.
Table is partitoned by year,month,day
ds.write.mode(SaveMode.Overwrite).insertInto("db.parqut_table")
https://issues.apache.org/ji
Have you made sure that the saveastable stores them as parquet?
> On 20. Aug 2017, at 18:07, KhajaAsmath Mohammed
> wrote:
>
> we are using parquet tables, is it causing any performance issue?
>
>> On Sun, Aug 20, 2017 at 9:09 AM, Jörn Franke wrote:
>> Improving the performance of Hive can be
we are using parquet tables, is it causing any performance issue?
On Sun, Aug 20, 2017 at 9:09 AM, Jörn Franke wrote:
> Improving the performance of Hive can be also done by switching to
> Tez+llap as an engine.
> Aside from this : you need to check what is the default format that it
> writes to
Improving the performance of Hive can be also done by switching to Tez+llap as
an engine.
Aside from this : you need to check what is the default format that it writes
to Hive. One issue for the slow storing into a hive table could be that it
writes by default to csv/gzip or csv/bzip2
> On 20.
Yes we tried hive and want to migrate to spark for better performance. I am
using paraquet tables . Still no better performance while loading.
Sent from my iPhone
> On Aug 20, 2017, at 2:24 AM, Jörn Franke wrote:
>
> Have you tried directly in Hive how the performance is?
>
> In which Forma
Have you tried directly in Hive how the performance is?
In which Format do you expect Hive to write? Have you made sure it is in this
format? It could be that you use an inefficient format (e.g. CSV + bzip2).
> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed
> wrote:
>
> Hi,
>
> I have writ
10 matches
Mail list logo