Re: Spark SQL is not returning records for hive bucketed tables on HDP

@Sanjiv Singh Sun, 21 Feb 2016 22:10:25 -0800

Compaction would have been triggered automatically as following properties
already set in *hive-site.xml*. and also *NO_AUTO_COMPACTION* property not
been set for these tables.



    <property>

      <name>hive.compactor.initiator.on</name>

      <value>true</value>

    </property>

    <property>

      <name>hive.compactor.worker.threads</name>

      <value>1</value>

    </property>


Documentation is upset sometimes.




Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Mon, Feb 22, 2016 at 9:49 AM, Varadharajan Mukundan <srinath...@gmail.com
> wrote:

> Yes, I was burned down by this issue couple of weeks back. This also means
> that after every insert job, compaction should be run to access new rows
> from Spark. Sad that this issue is not documented / mentioned anywhere.
>
> On Mon, Feb 22, 2016 at 9:27 AM, @Sanjiv Singh <sanjiv.is...@gmail.com>
> wrote:
>
>> Hi Varadharajan,
>>
>> Thanks for your response.
>>
>> Yes it is transnational table; See below *show create table. *
>>
>> Table hardly have 3 records , and after triggering minor compaction on
>> tables , it start showing results on spark SQL.
>>
>>
>> > *ALTER TABLE hivespark COMPACT 'major';*
>>
>>
>> > *show create table hivespark;*
>>
>>   CREATE TABLE `hivespark`(
>>
>>     `id` int,
>>
>>     `name` string)
>>
>>   CLUSTERED BY (
>>
>>     id)
>>
>>   INTO 32 BUCKETS
>>
>>   ROW FORMAT SERDE
>>
>>     'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>>
>>   STORED AS INPUTFORMAT
>>
>>     'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>>
>>   OUTPUTFORMAT
>>
>>     'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>>
>>   LOCATION
>>
>>     'hdfs://myhost:8020/apps/hive/warehouse/mydb.db/hivespark'
>>   TBLPROPERTIES (
>>
>>     'COLUMN_STATS_ACCURATE'='true',
>>
>>     'last_modified_by'='root',
>>
>>     'last_modified_time'='1455859079',
>>
>>     'numFiles'='37',
>>
>>     'numRows'='3',
>>
>>     'rawDataSize'='0',
>>
>>     'totalSize'='11383',
>>
>>     'transactional'='true',
>>
>>     'transient_lastDdlTime'='1455864121') ;
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Mon, Feb 22, 2016 at 9:01 AM, Varadharajan Mukundan <
>> srinath...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is the transaction attribute set on your table? I observed that hive
>>> transaction storage structure do not work with spark yet. You can confirm
>>> this by looking at the transactional attribute in the output of "desc
>>> extended <tablename>" in hive console.
>>>
>>> If you'd need to access transactional table, consider doing a major
>>> compaction and then try accessing the tables
>>>
>>> On Mon, Feb 22, 2016 at 8:57 AM, @Sanjiv Singh <sanjiv.is...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> I have observed that Spark SQL is not returning records for hive
>>>> bucketed ORC tables on HDP.
>>>>
>>>>
>>>>
>>>> On spark SQL , I am able to list all tables , but queries on hive
>>>> bucketed tables are not returning records.
>>>>
>>>> I have also tried the same for non-bucketed hive tables. it is working
>>>> fine.
>>>>
>>>>
>>>>
>>>> Same is working on plain Apache setup.
>>>>
>>>> Let me know if needs other details.
>>>>
>>>> Regards
>>>> Sanjiv Singh
>>>> Mob :  +091 9990-447-339
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> M. Varadharajan
>>>
>>> ------------------------------------------------
>>>
>>> "Experience is what you get when you didn't get what you wanted"
>>>                -By Prof. Randy Pausch in "The Last Lecture"
>>>
>>> My Journal :- http://varadharajan.in
>>>
>>
>>
>
>
> --
> Thanks,
> M. Varadharajan
>
> ------------------------------------------------
>
> "Experience is what you get when you didn't get what you wanted"
>                -By Prof. Randy Pausch in "The Last Lecture"
>
> My Journal :- http://varadharajan.in
>

Re: Spark SQL is not returning records for hive bucketed tables on HDP

Reply via email to