Re: How to perform hot Kudu table swap after upgrade to Impala 2.12?

2018-07-30 Thread Zoltan Ivanfi
Hi,

I think swapping tables is indeed a common need and not only for Kudu
tables. For this reason this workaround was not particularly good in my
opinion as it was Kudu-specific. Since Impala tables may have different
names than their corresponding tables in Kudu, this could be used to
provide the additional layer of indirection needed for swapping Kudu
tables, but not for other kind of tables (e.g. Parquet ones).

I think at this point the most widely applicable "layer of indirection" is
using VIEWs, which works for any kind of table. If that is not optimal for
this use case for whatever reason, maybe some more lightweight alternative
could be considered for table aliasing in the future.

Br,

Zoltan

On Mon, Jul 30, 2018 at 9:07 AM Gabor Kaszab 
wrote:

> Thanks Tim for answering this!
>
> One note for the very first mail in this thread:
> https://issues.apache.org/jira/browse/IMPALA-6375 won't fix this issue
> either. It will allow the user to make a managed Kudu table external and to
> modify the underlying kudu.table_name in one step. With the current
> implementation this has to be done in two steps. However, modifying
> kudu.table_name of a managed table still won't be feasible.
>
> Cheers,
> Gabor
>
>
> On Sat, Jul 28, 2018 at 2:56 AM, Boris Tyukin 
> wrote:
>
>> thanks so much, Tim. I do feel much better now that you've explained the
>> reasons behind.
>>
>> Using another client makes sense - will check that out. I did see a bunch
>> of methods in Kudu API but was hoping to use Impala all the way.
>>
>> It would be really cool if this Jira will get traction as this is a very
>> common technique with Hive and Impala on HDFS to swap tables and partitions
>> to ensure safe process of moving large amounts of data to production tables
>> https://jira.apache.org/jira/browse/KUDU-2327
>> Support atomic swap of tables or partitions
>>
>>
>>
>>
>> On Fri, Jul 27, 2018 at 7:24 PM Tim Armstrong 
>> wrote:
>>
>>> Hi,
>>>   Sorry you ran into this - we don't deliberately want to break
>>> workflows but it can be tricky if we accidentally expose implementation
>>> details. There was a previous CVE that resulted from creative use of this
>>> functionality
>>> https://lists.apache.org/thread.html/74a163df0cdefcd738c8d18821e69aa69eed2ba5384c0cc255d15c4b@%3Cannounce.apache.org%3E
>>> so part of the motivation was to simplify the table states and state
>>> transitions we need to deal with and make it easier to reason about and
>>> test thoroughly.
>>>
>>> We do have a "kudu" label in JIRA if you want to find Kudu-related
>>> Impala JIRAs. It would be unusual for one open-source project to have veto
>>> power over changes in another project or to create duplicate JIRAs in
>>> multiple apache projects for the same work. We do generally work closely
>>> with the Kudu project.
>>>
>>> I think the main workaround would be to use a different Kudu client to
>>> directly drop the tables. AFAIK the intent of external Kudu tables was
>>> generally that they would be dropped externally to Impala - I don't think
>>> we anticipated the "attach then drop" method for
>>>
>>> On Fri, Jul 27, 2018 at 8:27 AM, Boris Tyukin 
>>> wrote:
>>>
 So the change was made by Impala developer but it is only relevant to
 Kudu, taking away the only way to swap tables.

 I am curious if this change was agreed with Kudu devs. And if changes
 like that should be tracked by both Kudu and impala JIRAs since Impala is
 the only way right now to work with Kudu, besides APIs, that requires
 coding.

 Is there someone who chairs this type of decisions that impact Impala /
 Kudu users?

 This is important for me to understand before we invest into Kudu.




 On Fri, Jul 27, 2018 at 8:53 AM Boris Tyukin 
 wrote:

> oh nowhy?? just why?? we are about to upgrade to 2.12...
>
> Todd, can this "improvement" get rolled back? This a breaking change
> and does not contribute to making anything better. And now the only good
> way to swap Kudu tables is gone.
>
> I am really frustrated. IMPALA-5654
>  should never been
> approved without giving users a good alternative.
>
> Boris
>
> On Fri, Jul 27, 2018 at 7:10 AM Cliff Resnick 
> wrote:
>
>> We sometimes need to replace dimension tables in Kudu in a live
>> database. The technique is described here:
>>
>>
>> https://boristyukin.com/how-to-hot-swap-apache-kudu-tables-with-apache-impala/
>>
>> After 2.12 and IMPALA-5654
>>  it seems there
>> is no longer a way to perform the final step, where the hot swap Kudu
>> target table is renamed back to the original. It looks like
>> IMPALA-6375  is
>> going to address this, but in the meantime is there another workaround we
>> can 

Re: How to perform hot Kudu table swap after upgrade to Impala 2.12?

2018-07-30 Thread Gabor Kaszab
Thanks Tim for answering this!

One note for the very first mail in this thread:
https://issues.apache.org/jira/browse/IMPALA-6375 won't fix this issue
either. It will allow the user to make a managed Kudu table external and to
modify the underlying kudu.table_name in one step. With the current
implementation this has to be done in two steps. However, modifying
kudu.table_name of a managed table still won't be feasible.

Cheers,
Gabor


On Sat, Jul 28, 2018 at 2:56 AM, Boris Tyukin  wrote:

> thanks so much, Tim. I do feel much better now that you've explained the
> reasons behind.
>
> Using another client makes sense - will check that out. I did see a bunch
> of methods in Kudu API but was hoping to use Impala all the way.
>
> It would be really cool if this Jira will get traction as this is a very
> common technique with Hive and Impala on HDFS to swap tables and partitions
> to ensure safe process of moving large amounts of data to production tables
> https://jira.apache.org/jira/browse/KUDU-2327
> Support atomic swap of tables or partitions
>
>
>
>
> On Fri, Jul 27, 2018 at 7:24 PM Tim Armstrong 
> wrote:
>
>> Hi,
>>   Sorry you ran into this - we don't deliberately want to break workflows
>> but it can be tricky if we accidentally expose implementation details.
>> There was a previous CVE that resulted from creative use of this
>> functionality https://lists.apache.org/thread.html/
>> 74a163df0cdefcd738c8d18821e69aa69eed2ba5384c0cc255d15c4b@%
>> 3Cannounce.apache.org%3E so part of the motivation was to simplify the
>> table states and state transitions we need to deal with and make it easier
>> to reason about and test thoroughly.
>>
>> We do have a "kudu" label in JIRA if you want to find Kudu-related Impala
>> JIRAs. It would be unusual for one open-source project to have veto power
>> over changes in another project or to create duplicate JIRAs in multiple
>> apache projects for the same work. We do generally work closely with the
>> Kudu project.
>>
>> I think the main workaround would be to use a different Kudu client to
>> directly drop the tables. AFAIK the intent of external Kudu tables was
>> generally that they would be dropped externally to Impala - I don't think
>> we anticipated the "attach then drop" method for
>>
>> On Fri, Jul 27, 2018 at 8:27 AM, Boris Tyukin 
>> wrote:
>>
>>> So the change was made by Impala developer but it is only relevant to
>>> Kudu, taking away the only way to swap tables.
>>>
>>> I am curious if this change was agreed with Kudu devs. And if changes
>>> like that should be tracked by both Kudu and impala JIRAs since Impala is
>>> the only way right now to work with Kudu, besides APIs, that requires
>>> coding.
>>>
>>> Is there someone who chairs this type of decisions that impact Impala /
>>> Kudu users?
>>>
>>> This is important for me to understand before we invest into Kudu.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 27, 2018 at 8:53 AM Boris Tyukin 
>>> wrote:
>>>
 oh nowhy?? just why?? we are about to upgrade to 2.12...

 Todd, can this "improvement" get rolled back? This a breaking change
 and does not contribute to making anything better. And now the only good
 way to swap Kudu tables is gone.

 I am really frustrated. IMPALA-5654
  should never been
 approved without giving users a good alternative.

 Boris

 On Fri, Jul 27, 2018 at 7:10 AM Cliff Resnick  wrote:

> We sometimes need to replace dimension tables in Kudu in a live
> database. The technique is described here:
>
> https://boristyukin.com/how-to-hot-swap-apache-kudu-
> tables-with-apache-impala/
>
> After 2.12 and IMPALA-5654
>  it seems there is
> no longer a way to perform the final step, where the hot swap Kudu target
> table is renamed back to the original. It looks like IMPALA-6375
>  is going to
> address this, but in the meantime is there another workaround we can use?
>
>
>
>
>
>>


Re: How to perform hot Kudu table swap after upgrade to Impala 2.12?

2018-07-27 Thread Boris Tyukin
thanks so much, Tim. I do feel much better now that you've explained the
reasons behind.

Using another client makes sense - will check that out. I did see a bunch
of methods in Kudu API but was hoping to use Impala all the way.

It would be really cool if this Jira will get traction as this is a very
common technique with Hive and Impala on HDFS to swap tables and partitions
to ensure safe process of moving large amounts of data to production tables
https://jira.apache.org/jira/browse/KUDU-2327
Support atomic swap of tables or partitions




On Fri, Jul 27, 2018 at 7:24 PM Tim Armstrong 
wrote:

> Hi,
>   Sorry you ran into this - we don't deliberately want to break workflows
> but it can be tricky if we accidentally expose implementation details.
> There was a previous CVE that resulted from creative use of this
> functionality
> https://lists.apache.org/thread.html/74a163df0cdefcd738c8d18821e69aa69eed2ba5384c0cc255d15c4b@%3Cannounce.apache.org%3E
> so part of the motivation was to simplify the table states and state
> transitions we need to deal with and make it easier to reason about and
> test thoroughly.
>
> We do have a "kudu" label in JIRA if you want to find Kudu-related Impala
> JIRAs. It would be unusual for one open-source project to have veto power
> over changes in another project or to create duplicate JIRAs in multiple
> apache projects for the same work. We do generally work closely with the
> Kudu project.
>
> I think the main workaround would be to use a different Kudu client to
> directly drop the tables. AFAIK the intent of external Kudu tables was
> generally that they would be dropped externally to Impala - I don't think
> we anticipated the "attach then drop" method for
>
> On Fri, Jul 27, 2018 at 8:27 AM, Boris Tyukin 
> wrote:
>
>> So the change was made by Impala developer but it is only relevant to
>> Kudu, taking away the only way to swap tables.
>>
>> I am curious if this change was agreed with Kudu devs. And if changes
>> like that should be tracked by both Kudu and impala JIRAs since Impala is
>> the only way right now to work with Kudu, besides APIs, that requires
>> coding.
>>
>> Is there someone who chairs this type of decisions that impact Impala /
>> Kudu users?
>>
>> This is important for me to understand before we invest into Kudu.
>>
>>
>>
>>
>> On Fri, Jul 27, 2018 at 8:53 AM Boris Tyukin 
>> wrote:
>>
>>> oh nowhy?? just why?? we are about to upgrade to 2.12...
>>>
>>> Todd, can this "improvement" get rolled back? This a breaking change and
>>> does not contribute to making anything better. And now the only good way to
>>> swap Kudu tables is gone.
>>>
>>> I am really frustrated. IMPALA-5654
>>>  should never been
>>> approved without giving users a good alternative.
>>>
>>> Boris
>>>
>>> On Fri, Jul 27, 2018 at 7:10 AM Cliff Resnick  wrote:
>>>
 We sometimes need to replace dimension tables in Kudu in a live
 database. The technique is described here:


 https://boristyukin.com/how-to-hot-swap-apache-kudu-tables-with-apache-impala/

 After 2.12 and IMPALA-5654
  it seems there is
 no longer a way to perform the final step, where the hot swap Kudu target
 table is renamed back to the original. It looks like IMPALA-6375
  is going to
 address this, but in the meantime is there another workaround we can use?





>


Re: How to perform hot Kudu table swap after upgrade to Impala 2.12?

2018-07-27 Thread Tim Armstrong
Hi,
  Sorry you ran into this - we don't deliberately want to break workflows
but it can be tricky if we accidentally expose implementation details.
There was a previous CVE that resulted from creative use of this
functionality
https://lists.apache.org/thread.html/74a163df0cdefcd738c8d18821e69aa69eed2ba5384c0cc255d15c4b@%3Cannounce.apache.org%3E
so part of the motivation was to simplify the table states and state
transitions we need to deal with and make it easier to reason about and
test thoroughly.

We do have a "kudu" label in JIRA if you want to find Kudu-related Impala
JIRAs. It would be unusual for one open-source project to have veto power
over changes in another project or to create duplicate JIRAs in multiple
apache projects for the same work. We do generally work closely with the
Kudu project.

I think the main workaround would be to use a different Kudu client to
directly drop the tables. AFAIK the intent of external Kudu tables was
generally that they would be dropped externally to Impala - I don't think
we anticipated the "attach then drop" method for

On Fri, Jul 27, 2018 at 8:27 AM, Boris Tyukin  wrote:

> So the change was made by Impala developer but it is only relevant to
> Kudu, taking away the only way to swap tables.
>
> I am curious if this change was agreed with Kudu devs. And if changes like
> that should be tracked by both Kudu and impala JIRAs since Impala is the
> only way right now to work with Kudu, besides APIs, that requires coding.
>
> Is there someone who chairs this type of decisions that impact Impala /
> Kudu users?
>
> This is important for me to understand before we invest into Kudu.
>
>
>
>
> On Fri, Jul 27, 2018 at 8:53 AM Boris Tyukin 
> wrote:
>
>> oh nowhy?? just why?? we are about to upgrade to 2.12...
>>
>> Todd, can this "improvement" get rolled back? This a breaking change and
>> does not contribute to making anything better. And now the only good way to
>> swap Kudu tables is gone.
>>
>> I am really frustrated. IMPALA-5654
>>  should never been
>> approved without giving users a good alternative.
>>
>> Boris
>>
>> On Fri, Jul 27, 2018 at 7:10 AM Cliff Resnick  wrote:
>>
>>> We sometimes need to replace dimension tables in Kudu in a live
>>> database. The technique is described here:
>>>
>>> https://boristyukin.com/how-to-hot-swap-apache-kudu-
>>> tables-with-apache-impala/
>>>
>>> After 2.12 and IMPALA-5654
>>>  it seems there is
>>> no longer a way to perform the final step, where the hot swap Kudu target
>>> table is renamed back to the original. It looks like IMPALA-6375
>>>  is going to address
>>> this, but in the meantime is there another workaround we can use?
>>>
>>>
>>>
>>>
>>>


Re: How to perform hot Kudu table swap after upgrade to Impala 2.12?

2018-07-27 Thread Boris Tyukin
oh nowhy?? just why?? we are about to upgrade to 2.12...

Todd, can this "improvement" get rolled back? This a breaking change and
does not contribute to making anything better. And now the only good way to
swap Kudu tables is gone.

I am really frustrated. IMPALA-5654
 should never been
approved without giving users a good alternative.

Boris

On Fri, Jul 27, 2018 at 7:10 AM Cliff Resnick  wrote:

> We sometimes need to replace dimension tables in Kudu in a live database.
> The technique is described here:
>
>
> https://boristyukin.com/how-to-hot-swap-apache-kudu-tables-with-apache-impala/
>
> After 2.12 and IMPALA-5654
>  it seems there is no
> longer a way to perform the final step, where the hot swap Kudu target
> table is renamed back to the original. It looks like IMPALA-6375
>  is going to address
> this, but in the meantime is there another workaround we can use?
>
>
>
>
>