[RESULT] [VOTE] 3.1.0 release candidate 1

2018-11-30 Thread Zoltan Borok-Nagy
https://lists.apache.org/thread.html/2b30333eb187e7453c152e2bac558713ad7993da11bf559697f590ee@%3Cdev.impala.apache.org%3E

The vote has passed with the following tally:

+1 (binding):
* Jim Apple
* Michael Ho
* Michael Brown
* Thomas Tauber-Marshall

0: None
-1: None

Thanks everyone for testing and participating in the vote.

I'd also like to ask a PMC member to help me with step #19 (publishing the
release artifacts) from
https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#

Thanks,
Zoltan


[VOTE] 3.1.0 release candidate 1

2018-11-27 Thread Zoltan Borok-Nagy
This is a vote to release Impala 3.1.0

The artifacts for testing can be downloaded from:
https://dist.apache.org/repos/dist/dev/impala/3.1.0/RC1/
Git tag: 3.1.0-rc1
Tree hash: 0fb7b90d5dad7aeedd48db28939b1999a7a3

Please vote +1 or -1. -1 votes should be accompanied by an explanation of
the reason. Only PMC members have binding votes, but other
community members are encouraged to cast non-binding votes. This vote will
pass if there are 3 binding +1 votes and more binding +1 votes than -1
votes.

This wiki page describes how to check the release before you vote:
https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#


Re: [DISCUSS] 3.1.0 release

2018-11-27 Thread Zoltan Borok-Nagy
Hi,

Michael:
Thanks, I missed that. I reverted IMPALA-7148 since it is targeted for
3.2.0 and doesn't have any effect without IMPALA-4063.

Alexandra:
Unfortunately I cannot see the image you attached.
Anyway, I also counted 10 new doc commits targeted for 3.1.0:
https://git-wip-us.apache.org/repos/asf?p=impala.git;a=shortlog;h=refs/heads/branch-3.1.0
7d3e9be [DOCS] Copy edits in impala_custom_timezones
9749096 [DOCS] A number of typos were fixed in impala_dedicated_coordinator
3d1afb4 IMPALA-7233: [DOCS] Support for IANA timezone database
df6e92f IMPALA-7815: [DOCS] Release notes for 3.1
3ec779a IMPALA-7861: [DOCS] TLS enabled by default regardless of URI scheme
0d5b5d4 [DOCS] Added a note in impala_scan_bytes_limit.xml
f5348d4 IMPALA-7836: [DOCS] Format changes in impala_topn_bytes_limit.xml
bd573d1 IMPALA-7634: [DOCS] Document the new SHUTDOWN statement
8872e8b IMPALA-7836: [DOCS] Document TOPN_BYTES_LIMIT query option
174ac2f IMPALA-7806: [DOCS] Updated Known Issues in 3.1

So I moved forward to do the RC1, I'll send a [VOTE] mail about it shortly.
*If anyone has objections about the release candidate*, e.g. you want to
add/remove a commit, you can vote with -1 on the RC1.

Thanks,
Zoltan


On Tue, Nov 27, 2018 at 2:23 AM Alexandra Rodoni 
wrote:

> I count 10 doc commits since your last cherry-pick:
>
> [image: image.png]
> Thanks!
>
> On Mon, Nov 19, 2018 at 8:28 AM Zoltan Borok-Nagy 
> wrote:
>
>> Hey Folks,
>>
>> Status of the 3.1.0 release:
>>
>> I chose IMPALA-5950 (e3a7027) as branching point, and cherry picked
>> commits
>> until IMPALA-5031 (067657a) (inclusive range) with the following
>> exceptions:
>>
>>- IMPALA-7213 <https://issues.apache.org/jira/browse/IMPALA-7213>:
>> Port
>>ReportExecStatus() RPCs to KRPC* (requested not to include, tagged as
>>3.2.0)*
>>- IMPALA-4063 <https://issues.apache.org/jira/browse/IMPALA-4063>:
>> Make
>>fragment instance reports per-query (or per-host) instead of
>> per-fragment
>>instance *(t**agged as 3.2.0, also not a clean cherry-pick)*
>>- IMPALA-7828 <https://issues.apache.org/jira/browse/IMPALA-7828>:
>> test_mem_leak()
>>is flaky *(t**agged as 3.2.0, also not a clean cherry-pick)*
>>- IMPALA-7477 <https://issues.apache.org/jira/browse/IMPALA-7477>:
>> Improve
>>QueryResultSet interface to allow appending a batch of rows at a time
>>*(t**agged as 3.2.0, **but it could be cleanly cherry-picked**)*
>>
>> Although "IMPALA-7148
>> <https://issues.apache.org/jira/browse/IMPALA-7148>:
>> test_profile_fragment_instances
>> is flaky" is tagged as 3.2.0, I decided to include it since it was a clean
>> cherry-pick and it only fixes a flaky test.
>>
>> *From now on please tag your Jiras' fix version as 3.2.0*, or send me an
>> email if you want to include your change in 3.1.0.
>> The release branch can be viewed here:
>>
>> https://git-wip-us.apache.org/repos/asf?p=impala.git;a=shortlog;h=refs/heads/branch-3.1.0
>>
>> I'll move on with the process when the docs are ready.
>>
>> BR,
>>  Zoltan
>>
>>
>>
>>
>>
>> On Mon, Nov 12, 2018 at 10:14 PM Alexandra Rodoni 
>> wrote:
>>
>> > I think about 7 working days will be enough to wrap up the doc work for
>> > 3.1:
>> > - TOPN query option
>> > - SHUTDOWN command
>> > - TIMEZONE changes
>> > - Minor release-related work
>> >
>> >
>> >
>> > On Mon, Nov 12, 2018 at 9:17 AM, Zoltan Borok-Nagy <
>> > borokna...@cloudera.com>
>> > wrote:
>> >
>> > > Hey Folks,
>> > >
>> > > It's been a week since we last talked about the 3.1.0 release.
>> > >
>> > > So I guess we'll only leave out the RPC-related commit from Michael.
>> > >
>> > > Is it OK if I start the branching and testing tomorrow?
>> > > Then, I'll cherry-pick the docs-related changes from Alex after they
>> land
>> > > in the repo.
>> > >
>> > > Or, do you prefer to wait with the branching until the docs-related
>> > changes
>> > > are made?
>> > > This way we'll have more changes in the release.
>> > >
>> > > BR,
>> > > Zoltan
>> > >
>> > >
>> > >
>> > > On Tue, Nov 6, 2018 at 4:23 PM Zoltan Borok-Nagy <
>> > borokna...@cloudera.com>
>> > > wrote:
>> > >
>> > > > Thanks for the suggestions.
>> > > > I wa

Re: [DISCUSS] 3.1.0 release

2018-11-19 Thread Zoltan Borok-Nagy
Hey Folks,

Status of the 3.1.0 release:

I chose IMPALA-5950 (e3a7027) as branching point, and cherry picked commits
until IMPALA-5031 (067657a) (inclusive range) with the following exceptions:

   - IMPALA-7213 <https://issues.apache.org/jira/browse/IMPALA-7213>: Port
   ReportExecStatus() RPCs to KRPC* (requested not to include, tagged as
   3.2.0)*
   - IMPALA-4063 <https://issues.apache.org/jira/browse/IMPALA-4063>: Make
   fragment instance reports per-query (or per-host) instead of per-fragment
   instance *(t**agged as 3.2.0, also not a clean cherry-pick)*
   - IMPALA-7828 <https://issues.apache.org/jira/browse/IMPALA-7828>:
test_mem_leak()
   is flaky *(t**agged as 3.2.0, also not a clean cherry-pick)*
   - IMPALA-7477 <https://issues.apache.org/jira/browse/IMPALA-7477>: Improve
   QueryResultSet interface to allow appending a batch of rows at a time
   *(t**agged as 3.2.0, **but it could be cleanly cherry-picked**)*

Although "IMPALA-7148
<https://issues.apache.org/jira/browse/IMPALA-7148>:
test_profile_fragment_instances
is flaky" is tagged as 3.2.0, I decided to include it since it was a clean
cherry-pick and it only fixes a flaky test.

*From now on please tag your Jiras' fix version as 3.2.0*, or send me an
email if you want to include your change in 3.1.0.
The release branch can be viewed here:
https://git-wip-us.apache.org/repos/asf?p=impala.git;a=shortlog;h=refs/heads/branch-3.1.0

I'll move on with the process when the docs are ready.

BR,
 Zoltan





On Mon, Nov 12, 2018 at 10:14 PM Alexandra Rodoni 
wrote:

> I think about 7 working days will be enough to wrap up the doc work for
> 3.1:
> - TOPN query option
> - SHUTDOWN command
> - TIMEZONE changes
> - Minor release-related work
>
>
>
> On Mon, Nov 12, 2018 at 9:17 AM, Zoltan Borok-Nagy <
> borokna...@cloudera.com>
> wrote:
>
> > Hey Folks,
> >
> > It's been a week since we last talked about the 3.1.0 release.
> >
> > So I guess we'll only leave out the RPC-related commit from Michael.
> >
> > Is it OK if I start the branching and testing tomorrow?
> > Then, I'll cherry-pick the docs-related changes from Alex after they land
> > in the repo.
> >
> > Or, do you prefer to wait with the branching until the docs-related
> changes
> > are made?
> > This way we'll have more changes in the release.
> >
> > BR,
> > Zoltan
> >
> >
> >
> > On Tue, Nov 6, 2018 at 4:23 PM Zoltan Borok-Nagy <
> borokna...@cloudera.com>
> > wrote:
> >
> > > Thanks for the suggestions.
> > > I wasn't aware of interactive git rebase. It might makes it simpler to
> > > carry out (a) if there are no conflicts.
> > >
> > > Zoltan
> > >
> > > On Tue, Nov 6, 2018 at 4:06 PM Jim Apple  wrote:
> > >
> > >> >
> > >> > I just have a technical question about it. Should we
> > >> > a) select an early branching point then do a lot of cherry picks for
> > the
> > >> > commits we want in and leave out the risky ones
> > >> > b) select a recent branching point then revert the risky commits on
> > the
> > >> > release branch
> > >> >
> > >>
> > >> I think (a) is easier for someone who is doing some git work on the
> > >> branch,
> > >> but our branches tend to be used once for releases and then rarely
> > touched
> > >> again, so it's not a disaster to do (b).
> > >>
> > >
> >
>


[DISCUSS] 3.1.0 release

2018-11-05 Thread Zoltan Borok-Nagy
Hi Folks,

It's been a while since we last released a minor version of Impala.
3.0.0 is out since May, and since then a couple of pretty cool features and
a good number of improvements are checked in.

I propose that we release 3.1.0 soon and I volunteer to be its release
manager. Please speak up and let the community know if anyone has any
objections to this.

Thanks,
Zoltan


Re: Re: New Impala committer - Quanlong Huang

2018-08-21 Thread Zoltan Borok-Nagy
Congrats Quanlong!

On Tue, Aug 21, 2018 at 9:34 AM Gabor Kaszab 
wrote:

> Congrats!
>
> On Sat, Aug 18, 2018 at 3:11 AM Quanlong Huang 
> wrote:
>
> > Thanks! Glad to work with you all!--Quanlong
> >
> > At 2018-08-18 03:09:38, "Yongjun Zhang"  wrote:
> > >Congratulations Quanlong!
> > >
> > >--Yngjun
> > >
> > >On Fri, Aug 17, 2018 at 12:07 PM, Jeszy  wrote:
> > >
> > >> Congrats Quanlong!
> > >>
> > >> On 17 August 2018 at 19:51, Csaba Ringhofer  >
> > >> wrote:
> > >> > Congrats!
> > >> >
> > >> > On Fri, Aug 17, 2018 at 6:32 PM, Philip Zeyliger <
> phi...@cloudera.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Congrats!
> > >> >>
> > >> >> On Fri, Aug 17, 2018 at 9:29 AM Tim Armstrong <
> > tarmstr...@cloudera.com>
> > >> >> wrote:
> > >> >>
> > >> >> >  The Project Management Committee (PMC) for Apache Impala has
> > invited
> > >> >> > Quanlong Huang to become a committer and we are pleased to
> announce
> > >> that
> > >> >> > they have accepted. Congratulations and welcome, Quanlong Huang!
> > >> >> >
> > >> >>
> > >>
> >
>


Re: Enabling automatic code review precommit job

2018-08-16 Thread Zoltan Borok-Nagy
I see, thanks for the answer! And also thanks for updating the description
of the jenkins job.

I don't know if it should trigger on clean rebases, maybe it could just
write a short comment in gerrit, something like that: "Code review checks
didn't run because it is a clean rebase. If you do want to run these
checks, you can do it here: https://jenkins.impala.io/gerrit_manual_trigger/
"

Zoltan


On Thu, Aug 16, 2018 at 7:19 PM Tim Armstrong 
wrote:

> Thanks for letting me know and sorry for the confusion. It's fine to go
> ahead and do a gerrit-verify-dry-run. I think the missing info that I
> didn't communicate earlier is:
>
>- gerrit-verify-dry-run is a superset of gerrit-code-review-checks so
>it's fine to run gerrit-verify-dry-run without waiting for the previous
>job. The intent is just to prefetch some of the more common reasons for
>merge failures.
>- I've configured gerrit-code-review-checks to not trigger on clean
>rebases, which is why it didn't trigger on PS8.
>- You can manually trigger a patchset if you have the appropriate
>jenkins access here: https://jenkins.impala.io/gerrit_manual_trigger/
>
> It seems like I should probably document this better. As a stopgap I
> improved the description of the jenkins job.
>
> Should I revisit triggering on clean rebases? I was originally thinking
> that it would add noise, but I can see that it could cause confusion. And
> of course clean rebases can break compilation.
>
> - Tim
>
>
> On Thu, Aug 16, 2018 at 5:33 AM, Zoltan Borok-Nagy <
> borokna...@cloudera.com>
> wrote:
>
> > Hi,
> >
> > I might have ran into a bug of gerrit-code-review-checks with this
> change:
> > https://gerrit.cloudera.org/#/c/11160/
> >
> > The job failed for PS 7, because it didn't contain
> > bin/jenkins/build-only.sh.
> >
> > So I rebased and uploaded PS 8, but that didn't trigger
> > gerrit-code-review-checks. So I went to jenkins and hit "Retrigger" on
> the
> > failed job, but it executed the job against PS 7, not PS 8.
> >
> > And when I look at "Build with parameters", I can only see the
> > IMPALA_REPO_URL and IMPALA_LZO_BRANCH parameters, so I cannot really run
> > this job against the my newest patch set.
> >
> > Anyway the change is already +2ed and I could run gerrit-verify-dryrun
> > which almost certainly do these checks as well, but if it didn't I don't
> > want to commit something that might break gerrit-core-review-checks for
> > subsequent changes.
> >
> > Thanks,
> > Zoltan
> >
> >
> >
> > On Wed, Aug 1, 2018 at 10:56 PM Tim Armstrong
> >  wrote:
> >
> > > The flake8 comments should now line up correctly with your diff. I got
> > more
> > > feedback that some of the comments were a little strict (and it's
> unclear
> > > which ones will be enforced) so I posted a diff to disable some of the
> > > checks:
> > >
> > > https://gerrit.cloudera.org/#/c/11102/
> > >
> > > On Tue, Jul 31, 2018 at 10:00 AM, Tim Armstrong <
> tarmstr...@cloudera.com
> > >
> > > wrote:
> > >
> > > > Todd pointed out a bug where it was posting flake8 comments that
> didn't
> > > > align with the diff. I figured out the issue but will keep the job
> in a
> > > > silent mode for a bit while I monitor it.
> > > >
> > > > On Tue, Jul 31, 2018 at 9:16 AM, Jim Apple
> >  > > >
> > > > wrote:
> > > >
> > > >> This is from my .emacs. The way it works is that you select a region
> > and
> > > >> then M-x pep8-region.
> > > >>
> > > >> (defun pep8-region ( b e)
> > > >>   (interactive "r")
> > > >>   (call-process-region b e
> > > >>"/home/jbapple/.local/bin/autopep8" t t nil
> > > >>"--indent-size=2" "--max-line-length=90" "-a"
> > > "-a"
> > > >>"-a" "-a" "-a" "-a" "-a" "-a"
> "--experimental"
> > > >> "-"))
> > > >>
> > > >> It's . . . not great. For instance, it doesn't understand that if
> > you're
> > > >> already 4 spaces in at the start of the region, you don't want to
> > revert
> > > >> back to 0 or 2 spaces in. That said, it can still be helpful. I
> don't
> > > >> think
> > 

Re: Enabling automatic code review precommit job

2018-08-16 Thread Zoltan Borok-Nagy
Hi,

I might have ran into a bug of gerrit-code-review-checks with this change:
https://gerrit.cloudera.org/#/c/11160/

The job failed for PS 7, because it didn't contain
bin/jenkins/build-only.sh.

So I rebased and uploaded PS 8, but that didn't trigger
gerrit-code-review-checks. So I went to jenkins and hit "Retrigger" on the
failed job, but it executed the job against PS 7, not PS 8.

And when I look at "Build with parameters", I can only see the
IMPALA_REPO_URL and IMPALA_LZO_BRANCH parameters, so I cannot really run
this job against the my newest patch set.

Anyway the change is already +2ed and I could run gerrit-verify-dryrun
which almost certainly do these checks as well, but if it didn't I don't
want to commit something that might break gerrit-core-review-checks for
subsequent changes.

Thanks,
Zoltan



On Wed, Aug 1, 2018 at 10:56 PM Tim Armstrong
 wrote:

> The flake8 comments should now line up correctly with your diff. I got more
> feedback that some of the comments were a little strict (and it's unclear
> which ones will be enforced) so I posted a diff to disable some of the
> checks:
>
> https://gerrit.cloudera.org/#/c/11102/
>
> On Tue, Jul 31, 2018 at 10:00 AM, Tim Armstrong 
> wrote:
>
> > Todd pointed out a bug where it was posting flake8 comments that didn't
> > align with the diff. I figured out the issue but will keep the job in a
> > silent mode for a bit while I monitor it.
> >
> > On Tue, Jul 31, 2018 at 9:16 AM, Jim Apple  >
> > wrote:
> >
> >> This is from my .emacs. The way it works is that you select a region and
> >> then M-x pep8-region.
> >>
> >> (defun pep8-region ( b e)
> >>   (interactive "r")
> >>   (call-process-region b e
> >>"/home/jbapple/.local/bin/autopep8" t t nil
> >>"--indent-size=2" "--max-line-length=90" "-a"
> "-a"
> >>"-a" "-a" "-a" "-a" "-a" "-a" "--experimental"
> >> "-"))
> >>
> >> It's . . . not great. For instance, it doesn't understand that if you're
> >> already 4 spaces in at the start of the region, you don't want to revert
> >> back to 0 or 2 spaces in. That said, it can still be helpful. I don't
> >> think
> >> this is as sophisticated as clang-format.el.
> >>
> >> On Tue, Jul 31, 2018 at 9:09 AM, Todd Lipcon  >
> >> wrote:
> >>
> >> > On Tue, Jul 31, 2018 at 9:02 AM, Tim Armstrong <
> >> > tarmstr...@cloudera.com.invalid> wrote:
> >> >
> >> > > I agree that it might be a little strict at the moment and disallow
> >> some
> >> > > reasonable formatting. The tool is controlled by the setup.cfg file
> in
> >> > the
> >> > > repo so it's easy enough to change the behaviour.
> >> > >
> >> > > I think we have been a little sloppy with Python style in general
> so I
> >> > > think some of these would be good to change over time. I think the
> >> main
> >> > > thing I'd like is to align the tool's behaviour with code reviews -
> if
> >> > > we're going to be strict about PEP 8 compliance in code reviews we
> >> should
> >> > > keep the tool strict.
> >> > >
> >> > > >109 : E125 continuation line with same indent as next logical
> >> line
> >> > > I think this formatting is hard to read and confusing, I'm in favour
> >> of
> >> > > leaving this enabled.
> >> > >
> >> > > >110 : E701 multiple statements on one line (colon)
> >> > > This if because of the one-line if statements we have in the code. I
> >> > don't
> >> > > feel strongly either way as long as we're consistent.
> >> > >
> >> > > >129 : E231 missing whitespace after ','
> >> > > >185 : E251 unexpected spaces around keyword / parameter equals
> >> > > >288 : E502 the backslash is redundant between brackets
> >> > > These seems like sloppy/inconsistent formatting to me, I'm in favour
> >> of
> >> > > keeping these enabled and fixing existing code as we go.
> >> > >
> >> > > > 368 : E302 expected 2 blank lines, found 1
> >> > > Our Python code is very inconsistent here, would be nice to make it
> >> more
> >> > > consistent. I'm in favour of keeping it enabled and fixing as we go.
> >> > >
> >> > > >187 : E121 continuation line under-indented for hanging indent
> >> > > >   1295 : E128 continuation line under-indented for visual indent
> >> > > On one hand it would be nice to follow the PEP 8 style here (
> >> > > https://www.python.org/dev/peps/pep-0008/#indentation) but the
> other
> >> > > idioms
> >> > > seem fine to me. I've been asked on Python code reviews to switch to
> >> the
> >> > > PEP 8 indentation style before so I think we should align the tools
> >> > > behaviour with what we're going to ask for code reviews.
> >> > >
> >> >
> >> >
> >> > Alright, I agree with all of the above. One suggestion, though: is it
> >> > possible to get something like 'autopep8' to run in a 'patch
> formatting'
> >> > mode where it only reformats changed lines? Then we could more easily
> >> just
> >> > run a single command to ensure that our patches are properly formatted
> >> > before submitting to 

Re: #pragma once?

2018-08-02 Thread Zoltan Borok-Nagy
+1 for #pragma once since it's cleaner and less error-prone.

Zoltan


On Wed, Aug 1, 2018 at 11:03 PM Todd Lipcon 
wrote:

> Yea, when we looked into it I recall that it worked fine on all compilers
> from the last 10 years or something (in fact I remember using #pragma once
> on Metrowerks Codewarrior more than fifteen years ago). Given we require
> C++14 I don't think #pragma once is going to be the limiting factor in
> compiler version portability.
>
> -Todd
>
> On Wed, Aug 1, 2018 at 12:01 PM, Sailesh Mukil
>  > wrote:
>
> > An advantage of using #pragma once is potential improved compilation
> > speeds. However, a con is that it's non-standard and therefore, its
> > behavior can change at any point and can also vary across compilers,
> > potentially making the code even less portable.
> >
> > That being said, since Kudu has been using it for a while and has had no
> > issues, we can do the same since the potential benefits outweigh the
> cons.
> >
> > On Wed, Aug 1, 2018 at 11:48 AM, Tim Armstrong <
> > tarmstr...@cloudera.com.invalid> wrote:
> >
> > > Todd brought up our include guards on a code review, asking why we
> don't
> > > use #pragma once instead: https://gerrit.cloudera.org/#/c/10988/5 . It
> > > sounds like Kudu has switched to it
> > >
> > > #pragma once does seem cleaner and our GCC and Clang versions are
> modern
> > > enough to support it.
> > >
> > > What do people think about switching to that as the preferred way of
> > > including headers only once?
> > >
> > > - Tim
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


page index

2018-07-10 Thread Zoltan Borok-Nagy
Hi folks,

I started to write a design doc about page skipping based on the Parquet
page index
.

Impala already writes the page index since versions 2.13 and 3.1 (
IMPALA-5842 ).

Reading the page index and implementing the filtering is much more tricky
than writing the index, so we decided to start with a high-level doc rather
than the code itself.
Special thanks to Tim, Lars, and Csaba for their insights!

You can find the doc here, all comments are welcome:
https://docs.google.com/document/d/1D-el8njq_I-JKd3NDcW1mRXID_n0dBDKIkjWxwULVus/edit?usp=sharing

Thanks,
Zoltan


Re: boost::scoped_ptr vs std::unique_ptr

2018-07-06 Thread Zoltan Borok-Nagy
Yeah I see that, no worries.
I only wanted to draw some attention to it, so we might not forget it in
the future.

- Zoltan

On Fri, Jul 6, 2018 at 5:57 PM Tim Armstrong
 wrote:

> That's a fair observation Zoltan, we should avoid doing that. I would have
> definitely waited longer if it was a more consequential or irreversible
> decision, but I think even for more minor things there's no reason not to
> wait longer.
>
> On Fri, Jul 6, 2018 at 4:03 AM, Zoltan Borok-Nagy <
> borokna...@cloudera.com.invalid> wrote:
>
> > Hi,
> >
> > I'm also in favor of unique_ptr and maybe const unique_ptr.
> >
> > But I want to point out that the duration of the whole discussion was 1.5
> > hours.
> > Maybe we could keep such threads open at least 24 hours to let anyone in
> > the globe weigh in.
> >
> > BR,
> > Zoltan
> >
> >
> > On Fri, Jul 6, 2018 at 3:31 AM Jim Apple 
> > wrote:
> >
> > > SGTM
> > >
> > > On Thu, Jul 5, 2018 at 6:13 PM, Tim Armstrong <
> > > tarmstr...@cloudera.com.invalid> wrote:
> > >
> > > > Sounds like unique_ptr is preferred then going forward. I updated the
> > > wiki
> > > > page.
> > > >
> > > > >  Fwiw, I was under the impression from talking with people in the
> > past
> > > > that
> > > > > we were already trying to make this move, and the
> > > > > PartitionedAggregationNode refactor that just went in made the
> switch
> > > to
> > > > > unique_ptr, though no one commented on it in the review.
> > > > Yeah it sounds like there are two camps - people wanting to use
> > > unique_ptr
> > > > and people who don't mind scoped_ptr but are apathetic about it.
> > > >
> > > > > I suspect we could also make own own immobile_ptr with minimal
> > effort,
> > > > > thereby both making the difference more visible and reducing boost
> > > > > dependence.
> > > > I'd thought about this too and I'm not sure that it's worth doing
> > > something
> > > > non-standard that new contributors will have to learn.
> > > >
> > > >
> > > > On Thu, Jul 5, 2018 at 5:27 PM, Todd Lipcon
>  > >
> > > > wrote:
> > > >
> > > > > Definitely in favor.
> > > > >
> > > > > Personally I never found the "this pointer isn't movable" to be a
> > > > > worthwhile distinction. With unique_ptr you need to pretty
> explicitly
> > > > move
> > > > > it using std::move, so you don't get "accidental" moves like you
> used
> > > to
> > > > > with std::auto_ptr.
> > > > >
> > > > > Looking briefly at Kudu we have 129 unique_ptr members and only 7
> of
> > > them
> > > > > are marked const, so at least we haven't found that a particularly
> > > useful
> > > > > pattern.
> > > > >
> > > > > -Todd
> > > > >
> > > > > On Thu, Jul 5, 2018 at 5:23 PM, Jim Apple
> >  > > >
> > > > > wrote:
> > > > >
> > > > > > I suspect we could also make own own immobile_ptr with minimal
> > > effort,
> > > > > > thereby both making the difference more visible and reducing
> boost
> > > > > > dependence.
> > > > > >
> > > > > > On Thu, Jul 5, 2018 at 5:17 PM, Sailesh Mukil
> > > > >  > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I'm in favor.
> > > > > > >
> > > > > > > Since the main distinction is that a unique_ptr is moveable,
> > > whereas
> > > > a
> > > > > > > scoped_ptr is not, we should just make sure that we do our due
> > > > > diligence
> > > > > > > during code reviews so that we catch those cases.
> > > > > > >
> > > > > > > Also, making a unique_ptr const disallows moving it, since the
> > move
> > > > > > > constructor takes a non-const unique_ptr container. It probably
> > > won't
> > > > > > work
> > > > > > > in all places, but we could enforce that in certain parts of
> the
> > > > code.
> > > > > > >
> > > > 

Re: boost::scoped_ptr vs std::unique_ptr

2018-07-06 Thread Zoltan Borok-Nagy
Hi,

I'm also in favor of unique_ptr and maybe const unique_ptr.

But I want to point out that the duration of the whole discussion was 1.5
hours.
Maybe we could keep such threads open at least 24 hours to let anyone in
the globe weigh in.

BR,
Zoltan


On Fri, Jul 6, 2018 at 3:31 AM Jim Apple 
wrote:

> SGTM
>
> On Thu, Jul 5, 2018 at 6:13 PM, Tim Armstrong <
> tarmstr...@cloudera.com.invalid> wrote:
>
> > Sounds like unique_ptr is preferred then going forward. I updated the
> wiki
> > page.
> >
> > >  Fwiw, I was under the impression from talking with people in the past
> > that
> > > we were already trying to make this move, and the
> > > PartitionedAggregationNode refactor that just went in made the switch
> to
> > > unique_ptr, though no one commented on it in the review.
> > Yeah it sounds like there are two camps - people wanting to use
> unique_ptr
> > and people who don't mind scoped_ptr but are apathetic about it.
> >
> > > I suspect we could also make own own immobile_ptr with minimal effort,
> > > thereby both making the difference more visible and reducing boost
> > > dependence.
> > I'd thought about this too and I'm not sure that it's worth doing
> something
> > non-standard that new contributors will have to learn.
> >
> >
> > On Thu, Jul 5, 2018 at 5:27 PM, Todd Lipcon 
> > wrote:
> >
> > > Definitely in favor.
> > >
> > > Personally I never found the "this pointer isn't movable" to be a
> > > worthwhile distinction. With unique_ptr you need to pretty explicitly
> > move
> > > it using std::move, so you don't get "accidental" moves like you used
> to
> > > with std::auto_ptr.
> > >
> > > Looking briefly at Kudu we have 129 unique_ptr members and only 7 of
> them
> > > are marked const, so at least we haven't found that a particularly
> useful
> > > pattern.
> > >
> > > -Todd
> > >
> > > On Thu, Jul 5, 2018 at 5:23 PM, Jim Apple  >
> > > wrote:
> > >
> > > > I suspect we could also make own own immobile_ptr with minimal
> effort,
> > > > thereby both making the difference more visible and reducing boost
> > > > dependence.
> > > >
> > > > On Thu, Jul 5, 2018 at 5:17 PM, Sailesh Mukil
> > >  > > > >
> > > > wrote:
> > > >
> > > > > I'm in favor.
> > > > >
> > > > > Since the main distinction is that a unique_ptr is moveable,
> whereas
> > a
> > > > > scoped_ptr is not, we should just make sure that we do our due
> > > diligence
> > > > > during code reviews so that we catch those cases.
> > > > >
> > > > > Also, making a unique_ptr const disallows moving it, since the move
> > > > > constructor takes a non-const unique_ptr container. It probably
> won't
> > > > work
> > > > > in all places, but we could enforce that in certain parts of the
> > code.
> > > > >
> > > > > On Thu, Jul 5, 2018 at 4:49 PM, Thomas Tauber-Marshall <
> > > > > tmarsh...@cloudera.com.invalid> wrote:
> > > > >
> > > > > > I'm definitely in favor of using more standard c++ to reduce both
> > > > > confusion
> > > > > > and our reliance on boost, especially as I suspect a lot of
> people
> > > (eg.
> > > > > me)
> > > > > > don't know the subtle difference between scoped_ptr and
> unique_ptr
> > > off
> > > > > the
> > > > > > top of their head anyways.
> > > > > >
> > > > > > Fwiw, I was under the impression from talking with people in the
> > past
> > > > > that
> > > > > > we were already trying to make this move, and the
> > > > > > PartitionedAggregationNode refactor that just went in made the
> > switch
> > > > to
> > > > > > unique_ptr, though no one commented on it in the review.
> > > > > >
> > > > > > On Thu, Jul 5, 2018 at 4:39 PM Tim Armstrong
> > > > > >  wrote:
> > > > > >
> > > > > > > I was just talking with Michael Ho on a review about this
> > > > > > > https://gerrit.cloudera.org/#/c/10810/7/be/src/exec/scan-
> > > node.h@271
> > > > > > >
> > > > > > > For a while we've continued using scoped_ptr in some places
> > because
> > > > it
> > > > > > > supports a smaller set of operators and implies that the
> pointer
> > > > isn't
> > > > > > > movable. See
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/IMPALA/
> > > > > > Resource+Management+Best+Practices+in+Impala
> > > > > > > .
> > > > > > >
> > > > > > > I don't think we're consistently following this pattern and it
> > > seems
> > > > to
> > > > > > > cause some confusion about what the best practice is,
> > particularly
> > > > for
> > > > > > > people coming from other code bases. I personally like the
> > > > distinction,
> > > > > > but
> > > > > > > I don't feel that strongly about it.
> > > > > > >
> > > > > > > What do people think? Should we continue using scoped_ptr or
> move
> > > > away
> > > > > > from
> > > > > > > it. There is already a JIRA to make the change but we haven't
> > done
> > > it
> > > > > > > because of the above reasons:
> > > > > > > https://issues.apache.org/jira/browse/IMPALA-3444
> > > > > > >
> > > > > > > - Tim
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > 

Re: IMPALA-6981 (Set pthread names)

2018-06-25 Thread Zoltan Borok-Nagy
Hi Kushagra,

Thanks for showing interest in the project!

As a first step I recommend you to read the following wiki page:
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

I also recommend you to read Jim Apple's letter for new contributors:
http://mail-archives.apache.org/mod_mbox/impala-dev/201711.mbox/%3CCAC-pSX3dKtWogFCNqqKD=hk1gmaksneharxvve4asbzxr31...@mail.gmail.com%3E

If you have any further questions do not hesitate to contact us.

BR,
Zoltan




On Sat, Jun 23, 2018 at 7:28 PM Kushagra Madan 
wrote:

> Hi,
>
> I am a newbie to Open Source and like your project and find it aligning
> with my skills.
>
> I would like to contribute to the above mentioned issue. Any suggestions
> would be appreciated.
>
> Thanks,
>
> --
> Kushagra Madan
>


Re: New committers - Gabor Kaszab and Attila Jeges

2018-05-22 Thread Zoltan Borok-Nagy
Congrats, Gabor and Attila!

BR,
Zoltan


On Fri, May 18, 2018 at 10:36 PM, Tim Armstrong 
wrote:

> The Project Management Committee (PMC) for Apache Impala has invited Attila
> and Gabor to become committers and we are pleased to announce that they
> have accepted.
>
> Congratulations and welcome!
>


Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Zoltan Borok-Nagy
I would just like to mention that the fmax() / fmin() functions in C/C++
Math library follow the aforementioned IEEE 754-2008 min and max
specification:
http://en.cppreference.com/w/c/numeric/math/fmax

I think this behavior is also the most intuitive and useful regarding to
statistics. If we want to select the max value, I think it's reasonable to
ignore nulls and not-numbers.
I think it serves well the common case as well. E.g. there are a lot of
numbers, and some NaNs, I don't know if we want to ruin our upper bound by
choosing NaN as max.

And if there's still a NaN by an old writer, we can treat is as Inf.

Just my two cents.


On Fri, Feb 16, 2018 at 6:38 PM, Tim Armstrong 
wrote:

> The reader still can't correctly interpret those stats without knowing
> about the behaviour of that specific writer though, because it can't assume
> the absence of NaNs unless it knows that they are reading a file written by
> a writer that drops stats when it sees NaNs.
>
> It *could* fix the behaviour of some naive readers that don't correctly
> handle the current ambiguity in the specification, but I think those need
> to be fixed anyway because they will return wrong results for existing
> files.
>
> In the process of fixing the readers, you could then modify the readers so
> that they are aware of this special writer that drops stats with NaNs and
> knows that it is safe to use them, but I think those kind of shared
> reader-writer assumptions are essentially like having an unofficial
> extension of the Parquet spec.
>
> On Fri, Feb 16, 2018 at 9:20 AM, Lars Volker  wrote:
>
> > Yeah, I missed that. We set it per column, so all other types could keep
> > TypeDefinedOrder and floats could have something like
> NanAwareDoubleOrder.
> >
> > On Fri, Feb 16, 2018 at 9:18 AM, Tim Armstrong 
> > wrote:
> >
> > > We wouldn't need to rev the whole TypeDefinedOrder thing right?
> Couldn't
> > we
> > > just define a special order for floats? Essentially it would be a tag
> for
> > > writers to say "hey I know about this total order thing".
> > >
> > > On Fri, Feb 16, 2018 at 9:14 AM, Lars Volker  wrote:
> > >
> > > > I think one idea behind the column order fields was that if a reader
> > does
> > > > not recognize a value there, it needs to ignore the stats. If I
> > remember
> > > > correctly, that was intended to allow us to add new orderings for
> > > > collations, but it also seems useful to address gaps in the spec or
> > known
> > > > broken readers. In this case we would need to deprecate the default
> > > > "TypeDefinedOrder" and replace it with something like
> > > > "TypeDefinedOrderWithCorrectOrderingForDoubles". We could also count
> > up,
> > > > like TypeDefinedOrderV2 and so on.
> > > >
> > > > An alternative would be to list all writers that are known to have
> > > written
> > > > incorrect stats. However that will not prevent old implementations to
> > > > misinterpret correct stats - which I think was the main reason why we
> > > added
> > > > new stats fields.
> > > >
> > > >
> > > >
> > > > On Fri, Feb 16, 2018 at 9:03 AM, Alexander Behm <
> > alex.b...@cloudera.com>
> > > > wrote:
> > > >
> > > > > I hope the common cases is that data files do not contain these
> > special
> > > > > float values. As the simplest solution, how about writers refrain
> > from
> > > > > populating the stats if a special value is encountered?
> > > > >
> > > > > That fix does not preclude a more thorough solution in the future,
> > but
> > > it
> > > > > addresses the common case quickly.
> > > > >
> > > > > For existing data files we could check the writer version ignore
> > > filters
> > > > on
> > > > > float/double. I don't know whether min/max filtering is common on
> > > > > float/double, but I suspect it's not.
> > > > >
> > > > > On Fri, Feb 16, 2018 at 8:38 AM, Tim Armstrong <
> > > tarmstr...@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > There is an extensibility mechanism with the ColumnOrder union -
> I
> > > > think
> > > > > > that was meant to avoid the need to add new stat fields?
> > > > > >
> > > > > > Given that the bug was in the Parquet spec, we'll need to make a
> > spec
> > > > > > change anyway, so we could add a new ColumnOrder -
> > > > > FloatingPointTotalOrder?
> > > > > > at the same time as fixing the gap in the spec.
> > > > > >
> > > > > > It could make sense to declare that the default ordering for
> > > > > floats/doubles
> > > > > > is not NaN-aware (i.e. the reader should assume that NaN was
> > > > arbitrarily
> > > > > > ordered) and readers should either implement the required logic
> to
> > > > handle
> > > > > > that correctly (I had some ideas here:
> > > > > > https://issues.apache.org/jira/browse/IMPALA-6527?
> > > > > > focusedCommentId=16366106=com.atlassian.jira.
> > > > > > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-
> 16366106)
> > > > > > or ignore the stats.
> > > > > 

Re: Undefined Symbol: getJNIEnv

2018-01-05 Thread Zoltan Borok-Nagy
Hi,

Did you source impala-config.sh on the other host?

As Tim said in an other thread, getJNIEnv is defined in libhdfs.so.

Search for the binary of catalogd by issuing:
find -name catalogd -executable -not -type d

Check for the libhdfs.so dynamic library on each hosts:
ldd catalogd | grep libhdfs

So it can find the so on the host that compiled impala, but cannot find the
so on the other host, right?
If you can find the so in the filesystem, try to set LD_LIBRARY_PATH
manually.

BR,
Zoltan





On Fri, Jan 5, 2018 at 3:08 AM, sky  wrote:

> I compiled the Impala with the command( ./buildall.sh -notests -so
> -release). And then move the directory to the same directory to start on
> another host.But get the errors:
> $ cat logs/cluster/catalogd.ERROR
> Log file created at: **
> Running on machine: ***
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> E0103 **  8216 logging.cc:126] stderr will be logged to this file.
> /impala/be/build/latest/catalog/catalogd: symbol lookup error:
> /impala/be/build/release/util/libUtil.so: undefined symbol: getJNIEnv
>
>
> I used Centos 6.5.  In addition to recompiling without -so , are there any
> other ways ?
>
>
>
>