Re: UDA debugging, was Re: Broken/Flaky Tests

2018-06-07 Thread Csaba Ringhofer
Hi!

 I have left some comments in the code (lines starting with /// ) + removed
the md5 implementation parts to make the answer shorter.

 Note that I am not sure about the goal you want to achieve with the UDA -
can you explain what countMD5 would be used for?




> void md5(const unsigned char message[], int len, unsigned char result[])
> {
>
...

> memcpy(result, r, sizeof(int) * 4);
>
...

> }
>
> void init_func(FunctionContext* context, StringVal* val) {
>   val->is_null = true;
> }
> void update_func(FunctionContext* context, const StringVal& str,
> StringVal* result) {
>   if (str.is_null) return;
>   if (result->is_null) {
>
>  unsigned char *outbuf=context->Allocate(17);
> outbuf[16]='\0';
> md5(str.ptr, str.len, outbuf);
>
> uint8_t* copy = context->Allocate(17);
>
> if (copy == NULL) return;
> memcpy(copy, outbuf, 16);
> context->Free(outbuf);
> *result = StringVal(copy, str.len);
>
/// str.len: my understanding is that the hash is always 16 byte, so it
should be fix 16 instead (or 17 if \0 terminated)

>  return;
>   }
> unsigned char *outbuf1=context->Allocate(17);
>
/// no Free is called on outbuf - note that an array on stack would be also
good as buffer

> outbuf1[16]='\0';
> md5(str.ptr, sizeof(str.ptr), outbuf1);
>
///   sizeof(str.ptr): this will be always 8 - shouldn't it be str.len?

> uint8_t* copy1 = context->Allocate(17);
>
>  for(int i=0;i<16;i++)
>  {
>  copy1[i]=outbuf1[i] & result->ptr[i];
>
/// using & operator above means that result will contain less and less 1
bits, so it will converge to 0 - is this intentional?

>  }
>
> *result = StringVal(copy1, 17);
> return;
>
> }
> void merge_func(FunctionContext* context, const StringVal& src, StringVal*
> dst) {
> if (src.is_null) return;
>  for(int i=0;i<16;i++)
>  {
> dst->ptr[i]=src.ptr[i] & dst->ptr[i];
>
/// same as my last comment: this will converge to 0 if there are a lot of
distinct values

>  }
> }
>
> StringVal serialize_func(FunctionContext* context, const StringVal& val) {
>   if (val.is_null) return val;
>unsigned char *outbuf1=context->Allocate(17);
>outbuf1[16]='\0';
>
/// outbuf is not freed - it is actually not used at all

>   uint8_t* copy = context->Allocate(val.len);
>   memcpy(copy, val.ptr, 17);
>   return StringVal(copy,17);
> }
>
> StringVal finalize_func(FunctionContext* context, const StringVal& val) {
>   if (val.is_null) return val;
>   unsigned char *outbuf1=context->Allocate(17);
>   outbuf1[16]='\0';
>
/// outbuf is not freed - it is actually not used at all

>   uint8_t* copy = context->Allocate(val.len);
>   memcpy(copy, val.ptr, 17);
>   return StringVal(copy,17);
> }
>
>
>
>
>
>
> 
> 
> define function SQL in impala-shell:
> create aggregate function countMD5(string) returns string  location
> 'hdfs://nameservice1:8020//user/hive/udfjars/libmd5udaf.so'
> init_fn='init_func' update_fn='update_func' merge_fn='merge_func'
> serialize_fn='serialize_func'  finalize_fn='finalize_func';
>
>
> Maybe my C++ code has some problems, could you help me?
>


Re: UDA debugging, was Re: Broken/Flaky Tests

2018-06-06 Thread Jim Apple
You have provided the function prototype, but not its definition.

For cerr: http://impala.apache.org/docs/build/html/topics/impala_udf.html

"

To handle errors in UDFs, you call functions that are members of the initial
 FunctionContext* argument passed to your function.

A UDF can record one or more warnings, for conditions that indicate minor,
recoverable problems that do not cause the query to stop. The signature for
this function is:

bool AddWarning(const char* warning_msg);

"

On Wed, Jun 6, 2018 at 1:58 AM, 周胜为 <865392...@qq.com> wrote:

> Hi
> define function SQL:
>create aggregate function countMD5(string) returns string  location
> 'hdfs://nameservice1:8020//user/hive/udfjars/libmd5udaf.so'
> init_fn='init_func' update_fn='update_func' merge_fn='merge_func'
> serialize_fn='serialize_func'  finalize_fn='finalize_func';
> package include md5_udaf.h and md5_udaf.cpp file,
> function md5 defined as: void md5(const unsigned char message[], int len,
> unsigned char result[]);
>
> when I use countMD5 function in impala-shell, the return value is null. I
> feel confused. Perhas my code has problem, but I cann't find it.
> Another, when I write "std::cerr<<"init"; "  in the initial function(
> init_func) , the console doesn't print,Why? And Where to print?
>
> please help me and point to my error, I am a greener to C++.
> Thank you,very much!
>
>
>
>
>
> ------ 原始邮件 ------
> *发件人:* "Tim Armstrong";
> *发送时间:* 2018年6月6日(星期三) 中午12:38
> *收件人:* "dev";
> *主题:* Re: UDA debugging, was Re: Broken/Flaky Tests
>
> We're happy to give you pointers. If you could share your uda code and
> "create function" that would help us help you
>
> On Tue., 5 Jun. 2018, 19:31 Jim Apple,  wrote:
>
> > Hi 周胜为,
> >
> > I notice you are replying to other threads about different subjects when
> > you ask your questions. I think you will be more likely to get help if
> you
> > start new threads with relevant subjects and if you be as specific as
> > possible with your questions.
> >
> > The Impala wiki has some advice for debugging:
> > https://cwiki.apache.org/confluence/display/IMPALA/Impala+Debugging+Tips
> >
> >
> > On Tue, Jun 5, 2018 at 6:21 PM 周胜为 <865392...@qq.com> wrote:
> >
> > > One:I want to know how to debug the imapla UDA function
> > > Two:I would like to return a StringVal value through finalize function,
> > > but I get the null value every time. That is why?
> > >
> > >
> > >
> > >
> > > -- 原始邮件 --
> > > 发件人: "Tim Armstrong";
> > > 发送时间: 2018年6月6日(星期三) 上午9:08
> > > 收件人: "dev@impala";
> > >
> > > 主题: Re: Broken/Flaky Tests
> > >
> > >
> > >
> > > Ok, so 2/3 of those fixes are merged and the other is being merged.
> > >
> > > We still have a long list of flaky issues but I went through and we've
> > > either mitigated them or we're blocked on being able to repro them.
> > >
> > > I'll see how things look tomorrow, but if you have some low-risk
> changes
> > in
> > > mind, let me know and I can considering whether to merge them.
> > >
> > >
> > >
> > > On Tue, Jun 5, 2018 at 10:11 AM, Tim Armstrong <
> tarmstr...@cloudera.com>
> > > wrote:
> > >
> > > > Things are starting to look healthier now.
> > > >
> > > > I went through the broken-build JIRAs and downgraded some of the
> > > > infrequent infrastructure issues to critical so we have a clearer
> idea
> > of
> > > > what's actually breaking the build now versus what's an occasional
> > infra
> > > > issue: https://issues.apache.org/jira/issues/?jql=project%
> > > > 20%3D%20IMPALA%20AND%20status%20in%20(Open%2C%20%22In%
> > > > 20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20broken-
> > > > build%20ORDER%20BY%20priority%20DESC
> > > >
> > > > I'd like to see the fixes for these three issues go in:
> > > > https://issues.apache.org/jira/browse/IMPALA-7101
> > > > https://issues.apache.org/jira/browse/IMPALA-6956
> > > > https://issues.apache.org/jira/browse/IMPALA-7008
> > > >
> > > > We still need to fix any flaky infrastructure issues but that should
> be
> > > > able to proceed in parallel with other things.
> > > >
> > > >
> > > > On Fri, Jun 1, 2018 at 11:18 AM, Thomas Tauber-Marshall <
> >

Re: UDA debugging, was Re: Broken/Flaky Tests

2018-06-05 Thread Tim Armstrong
We're happy to give you pointers. If you could share your uda code and
"create function" that would help us help you

On Tue., 5 Jun. 2018, 19:31 Jim Apple,  wrote:

> Hi 周胜为,
>
> I notice you are replying to other threads about different subjects when
> you ask your questions. I think you will be more likely to get help if you
> start new threads with relevant subjects and if you be as specific as
> possible with your questions.
>
> The Impala wiki has some advice for debugging:
> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Debugging+Tips
>
>
> On Tue, Jun 5, 2018 at 6:21 PM 周胜为 <865392...@qq.com> wrote:
>
> > One:I want to know how to debug the imapla UDA function
> > Two:I would like to return a StringVal value through finalize function,
> > but I get the null value every time. That is why?
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "Tim Armstrong";
> > 发送时间: 2018年6月6日(星期三) 上午9:08
> > 收件人: "dev@impala";
> >
> > 主题: Re: Broken/Flaky Tests
> >
> >
> >
> > Ok, so 2/3 of those fixes are merged and the other is being merged.
> >
> > We still have a long list of flaky issues but I went through and we've
> > either mitigated them or we're blocked on being able to repro them.
> >
> > I'll see how things look tomorrow, but if you have some low-risk changes
> in
> > mind, let me know and I can considering whether to merge them.
> >
> >
> >
> > On Tue, Jun 5, 2018 at 10:11 AM, Tim Armstrong 
> > wrote:
> >
> > > Things are starting to look healthier now.
> > >
> > > I went through the broken-build JIRAs and downgraded some of the
> > > infrequent infrastructure issues to critical so we have a clearer idea
> of
> > > what's actually breaking the build now versus what's an occasional
> infra
> > > issue: https://issues.apache.org/jira/issues/?jql=project%
> > > 20%3D%20IMPALA%20AND%20status%20in%20(Open%2C%20%22In%
> > > 20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20broken-
> > > build%20ORDER%20BY%20priority%20DESC
> > >
> > > I'd like to see the fixes for these three issues go in:
> > > https://issues.apache.org/jira/browse/IMPALA-7101
> > > https://issues.apache.org/jira/browse/IMPALA-6956
> > > https://issues.apache.org/jira/browse/IMPALA-7008
> > >
> > > We still need to fix any flaky infrastructure issues but that should be
> > > able to proceed in parallel with other things.
> > >
> > >
> > > On Fri, Jun 1, 2018 at 11:18 AM, Thomas Tauber-Marshall <
> > > tmarsh...@cloudera.com> wrote:
> > >
> > >> So while its definitely better, there are still a large number of
> > failing
> > >> builds. We've been hit by at least: IMPALA-6642
> > >> , IMPALA-6956
> > >> , IMPALA-7101
> > >>  and IMPALA-3040
> > >> 
> > >> all within the last day, along with some mysterious crashes that I
> > haven't
> > >> filed anything for with Apache yet as there's very little info about
> > >> what's
> > >> actually going on. There are still multiple builds that haven't been
> > green
> > >> in over a month.
> > >> 
> > >>
> > >> Of course, if we hold commits for too long, there's a danger that when
> > we
> > >> open things back up a bunch of changes will all land at the same time
> > and
> > >> destabilize the builds again, putting back in the same situation. So,
> I
> > >> would say at a minimum that any changes that are relatively minor and
> > low
> > >> risk can go in now.
> > >>
> > >> My preference would be to hold off on major changes until we have more
> > >> stability.
> > >>
> > >> On Fri, Jun 1, 2018 at 10:30 AM Lars Volker  wrote:
> > >>
> > >> > Hi Thomas,
> > >> >
> > >> > Can you give an update on where we are with the builds?
> > >> >
> > >> > We currently have ~15 changes with a +2:
> > >> >
> > >> > https://gerrit.cloudera.org/#/q/status:open+project:Impala-A
> > >> SF+branch:master+label:Code-Review%253D2
> > >> >
> > >> > Thanks, Lars
> > >> >
> > >> > On Fri, May 25, 2018 at 5:20 PM, Henry Robinson 
> > >> wrote:
> > >> >
> > >> > > +1 - thanks for worrying about build health.
> > >> > >
> > >> > > On 25 May 2018 at 17:18, Jim Apple  wrote:
> > >> > >
> > >> > > > Sounds good to me. Thanks for taking ownership!
> > >> > > >
> > >> > > > On Fri, May 25, 2018 at 5:10 PM Thomas Tauber-Marshall <
> > >> > > > tmarsh...@cloudera.com> wrote:
> > >> > > >
> > >> > > > > Hey Impala community,
> > >> > > > >
> > >> > > > > There seems to have been an unusually large number of flaky or
> > >> broken
> > >> > > > tests
> > >> > > > > <
> > >> > > > > https://issues.apache.org/jira/browse/IMPALA-7073?jql=
> > >> > > > project%20%3D%20IMPALA%20AND%20status%20in%20(Open%2C%20%
> > >> > > > 22In%20Progress%22%2C%20Reopened)%20AND%20labels%
> > >> > > > 20in%20(flaky%2C%20broken-build)
> > >> > > > > >
> > >> > > > > cropping up