Re: [Spark][issue]Writing Hive Partitioned table

2016-10-19 Thread ayan guha
Hi Group

Sorry to rekindle this thread.

Using Spark 1.6.0 on CDH 5.7.

Any idea?


Best
Ayan

On Fri, Oct 7, 2016 at 5:08 PM, Mich Talebzadeh 
wrote:

> Hi Ayan,
>
> Depends on the version of Spark you are using.
>
> Have you tried updating stats in Hive?
>
> ANALYZE TABLE ${DATABASE}.${TABLE} PARTITION (${PARTITION_NAME}) COMPUTE
> STATISTICS FOR COLUMNS
>
> and then do
>
> show create table ${TABLE}
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 7 October 2016 at 03:46, ayan guha  wrote:
>
>> Posting with correct subject.
>>
>> On Fri, Oct 7, 2016 at 12:37 PM, ayan guha  wrote:
>>
>>> Hi
>>>
>>> Faced one issue:
>>>
>>> - Writing Hive Partitioned table using
>>>
>>> df.withColumn("partition_date",to_date(df["INTERVAL_DATE"]))
>>> .write.partitionBy('partition_date').saveAsTable("sometable"
>>> ,mode="overwrite")
>>>
>>> - Data got written to HDFS fine. I can see the folders with partition
>>> names such as
>>>
>>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
>>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29
>>>
>>> and so on.
>>> - Also, _common_metadata & _metadata files are written properly
>>>
>>> - I can read data from spark fine using 
>>> read.parquet("/app/somedb/hive/somedb.db/sometable").
>>> Printschema showing all columns.
>>>
>>> - However, I can not read from hive.
>>>
>>> Problem 1: Hive does not think the table is partitioned
>>> Problem 2: Hive sees only 1 column
>>> array from deserializer
>>> Problem 3: MSCK repair table failed, saying partitions are not in
>>> Metadata.
>>>
>>> Question: Is it a known issue with Spark to write to Hive partitioned
>>> table?
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha


Re: [Spark][issue]Writing Hive Partitioned table

2016-10-06 Thread Mich Talebzadeh
Hi Ayan,

Depends on the version of Spark you are using.

Have you tried updating stats in Hive?

ANALYZE TABLE ${DATABASE}.${TABLE} PARTITION (${PARTITION_NAME}) COMPUTE
STATISTICS FOR COLUMNS

and then do

show create table ${TABLE}

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 October 2016 at 03:46, ayan guha  wrote:

> Posting with correct subject.
>
> On Fri, Oct 7, 2016 at 12:37 PM, ayan guha  wrote:
>
>> Hi
>>
>> Faced one issue:
>>
>> - Writing Hive Partitioned table using
>>
>> df.withColumn("partition_date",to_date(df["INTERVAL_DATE"]))
>> .write.partitionBy('partition_date').saveAsTable("sometable"
>> ,mode="overwrite")
>>
>> - Data got written to HDFS fine. I can see the folders with partition
>> names such as
>>
>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29
>>
>> and so on.
>> - Also, _common_metadata & _metadata files are written properly
>>
>> - I can read data from spark fine using 
>> read.parquet("/app/somedb/hive/somedb.db/sometable").
>> Printschema showing all columns.
>>
>> - However, I can not read from hive.
>>
>> Problem 1: Hive does not think the table is partitioned
>> Problem 2: Hive sees only 1 column
>> array from deserializer
>> Problem 3: MSCK repair table failed, saying partitions are not in
>> Metadata.
>>
>> Question: Is it a known issue with Spark to write to Hive partitioned
>> table?
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>


[Spark][issue]Writing Hive Partitioned table

2016-10-06 Thread ayan guha
Posting with correct subject.

On Fri, Oct 7, 2016 at 12:37 PM, ayan guha  wrote:

> Hi
>
> Faced one issue:
>
> - Writing Hive Partitioned table using
>
> df.withColumn("partition_date",to_date(df["INTERVAL_DATE"]))
> .write.partitionBy('partition_date').saveAsTable("sometable"
> ,mode="overwrite")
>
> - Data got written to HDFS fine. I can see the folders with partition
> names such as
>
> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29
>
> and so on.
> - Also, _common_metadata & _metadata files are written properly
>
> - I can read data from spark fine using 
> read.parquet("/app/somedb/hive/somedb.db/sometable").
> Printschema showing all columns.
>
> - However, I can not read from hive.
>
> Problem 1: Hive does not think the table is partitioned
> Problem 2: Hive sees only 1 column
> array from deserializer
> Problem 3: MSCK repair table failed, saying partitions are not in Metadata.
>
> Question: Is it a known issue with Spark to write to Hive partitioned
> table?
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
Best Regards,
Ayan Guha