Hi Elliot,

Thanks for letting me know. HPL-SQL sounded particularly interesting. But
in the documentation I could not see any way to pass output generated by
one Hive query to the next one. The tool looks good as a homogeneous PL-SQL
platform for multiple big-data systems (http://www.hplsql.org/about).

However in order to break single complex hive query, DDLs look to be only
way in HPL-SQL too. Or is there any alternate way that I might have missed?

-- Saumitra S. Shahapure

On Thu, Dec 15, 2016 at 6:21 PM, Elliot West <tea...@gmail.com> wrote:

> I notice that HPL/SQL is not mentioned on the page I referenced, however I
> expect that is another approach that you could use to modularise:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=59690156
> http://www.hplsql.org/doc
>
> On 15 December 2016 at 17:17, Elliot West <tea...@gmail.com> wrote:
>
>> Some options are covered here, although there is no definitive guidance
>> as far as I know:
>>
>> https://cwiki.apache.org/confluence/display/Hive/Unit+Testin
>> g+Hive+SQL#UnitTestingHiveSQL-Modularisation
>>
>> On 15 December 2016 at 17:08, Saumitra Shahapure <
>> saumitra.offic...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> We are running and maintaining quite big and complex Hive SELECT query
>>> right now. It's basically a single SELECT query which performs JOIN of
>>> about ten other SELECT query outputs.
>>>
>>> A simplest way to refactor that we can think of is to break this query
>>> down into multiple views and then join the views. There is similar
>>> possibility to create intermediate tables.
>>>
>>> However creating multiple DDLs in order to maintain a single DML is not
>>> very smooth. We would end up polluting metadata database by creating views
>>> / intermediate tables which are used in just this ETL.
>>>
>>> What are the other efficient ways to maintain complex SQL queries
>>> written in Hive? Are there better ways to break Hive query into multiple
>>> modules?
>>>
>>> -- Saumitra S. Shahapure
>>>
>>
>>
>

Reply via email to