What you can do is at hive creates partitioned column for example date and
use Val finalDf = repartition(data frame.col("date-column")) and later say
insert overwrite tablename partition(date-column) select * from tempview
Would work as expected
On 11-Aug-2017 11:03 PM, "KhajaAsmath Mohammed"
wro
we had spark.sql.partitions as 4 but in hdfs it is ending up with 200 files
and 4 files are actually having data and rest of them are having zero bytes.
My only requirement is to run fast for hive insert overwrite query from
spark temporary table and end up having less files instead of more files
Please show the write() call, and the results in HDFS. What are all the
files you see?
On Fri, Aug 11, 2017 at 1:10 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> tempTable = union_df.registerTempTable("tempRaw")
>
> create = hc.sql('CREATE TABLE IF NOT EXISTS blab.pyspark_dpprq (v
tempTable = union_df.registerTempTable("tempRaw")
create = hc.sql('CREATE TABLE IF NOT EXISTS blab.pyspark_dpprq (vin string,
utctime timestamp, description string, descriptionuom string, providerdesc
string, dt_map string, islocation string, latitude double, longitude
double, speed double, value
Hi Asmath,
Could you share the code you're running?
Daniel
On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed,
wrote:
> Hi,
>
>
>
> I am using spark sql to write data back to hdfs and it is resulting in
> multiple output files.
>
>
>
> I tried changing number spark.sql.shuffle.partitions=1 but it
Hi,
I am using spark sql to write data back to hdfs and it is resulting in
multiple output files.
I tried changing number spark.sql.shuffle.partitions=1 but it resulted in
very slow performance.
Also tried coalesce and repartition still the same issue. any suggestions?
Thanks,
Asmath