Re: New Apache Impala committer - Shant Hovsepian

2020-10-14 Thread Jeszy
Congrats!

On Wed, Oct 14, 2020 at 11:09 AM Norbert Luksa
 wrote:
>
> Congratulations, Shant!
>
> On Wed, Oct 14, 2020 at 10:55 AM Laszlo Gaal 
> wrote:
>
> > Congratulations, Shant!
> >
> > On Wed, Oct 14, 2020 at 9:39 AM Tamas Mate  wrote:
> >
> > > Congrats, Shant!
> > >
> > > On Wed, Oct 14, 2020 at 9:35 AM Zoltán Borók-Nagy  > >
> > > wrote:
> > >
> > > > Congratulations, Shant!
> > > >
> > > >
> > > > On Wed, Oct 14, 2020 at 4:03 AM Shant Hovsepian <
> > > sh...@superdupershant.com
> > > > >
> > > > wrote:
> > > >
> > > > > Humbled and honored, thanks all!
> > > > >
> > > > > On Tue, Oct 13, 2020 at 7:59 PM Quanlong Huang <
> > > huangquanl...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Congratulations Shant!
> > > > > >
> > > > > > On Wed, Oct 14, 2020 at 7:11 AM Sahil Takiar <
> > takiar.sa...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Congrats Shant!
> > > > > > >
> > > > > > > On Tue, Oct 13, 2020 at 3:35 PM Fang-Yu Rao <
> > > fangyu@cloudera.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Shant!
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Oct 13, 2020 at 2:33 PM Vihang Karajgaonkar <
> > > > > > vih...@cloudera.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congratulations Shant!
> > > > > > > > >
> > > > > > > > > On Tue, Oct 13, 2020 at 2:29 PM David Rorke <
> > > dro...@cloudera.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Congrats Shant!
> > > > > > > > > >
> > > > > > > > > > On Tue, Oct 13, 2020 at 2:23 PM Andrew Sherman <
> > > > > > > asher...@cloudera.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Congratulations Shant!!
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Oct 13, 2020 at 2:16 PM Joe McDonnell <
> > > > > > > > > joemcdonn...@cloudera.com
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Congratulations, Shant!
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Oct 13, 2020 at 2:14 PM Kurt Deschler <
> > > > > > > > kdesc...@cloudera.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Congrats, Shant!
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Oct 13, 2020 at 5:10 PM Wenzhe Zhou <
> > > > > > > wz...@cloudera.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Congratulations, Shant!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Wenzhe Zhou
> > > > > > > > > > > > > > wz...@cloudera.com
> > > > > > > > > > > > > > 408-568-0101
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Oct 13, 2020 at 2:04 PM Riza Suminto <
> > > > > > > > > > > > riza.sumi...@cloudera.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Congratulations, Shant!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Oct 13, 2020 at 2:03 PM Zoram Thanga <
> > > > > > > > > zo...@cloudera.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Congratulations, Shant!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Oct 13, 2020 at 1:51 PM Tim Armstrong <
> > > > > > > > > > > > > tarmstr...@cloudera.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >  The Project Management Committee (PMC) for
> > > > Apache
> > > > > > > Impala
> > > > > > > > > has
> > > > > > > > > > > > > invited
> > > > > > > > > > > > > > > > Shant
> > > > > > > > > > > > > > > > > Hovsepian to become a committer and we are
> > > > pleased
> > > > > to
> > > > > > > > > > announce
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > have accepted.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Congratulations and welcome, Shant!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Zoram Thanga
> > > > > > > > > > > > > > > > Cloudera Inc.
> > > > > > > > > > > > > > > > 395 Page Mill Road
> > > > > > > > > > > > > > > > Palo Alto, CA 94306
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sahil Takiar
> > > > > > > Software Engineer
> > > > > > > takiar.sa...@gmail.com | (510) 673-0309
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Tamas Mate
> > > Software Engineer
> > > Cloudera
> > >
> >


Re: New Committer: Norbert Luksa

2020-04-07 Thread Jeszy
Congrats, Norbert!

On Tue, Apr 7, 2020 at 8:35 PM Zoltán Borók-Nagy  wrote:
>
> The Project Management Committee (PMC) for Apache Impala has
> invited Norbert Luksa to become a committer and we are pleased to announce
> that they have accepted.
> Congratulations and welcome, Norbert!


Re: [DISCUSS] 3.2.0 release

2019-03-18 Thread Jeszy
Hey,

I agree with Quanlong. It's increasingly difficult for me to keep up
with what's new, having a short version of the release notes with a
summary for each item (about the intent and impact, if not trivial
from the jira summary) would be useful. For example, KRPC was turned
on by default in 2.12 having a major impact on the scalability and
performance for most large scale users, but shows up as a bug called
'KRPC milestone 1'. :)
It'd be great to be able to read through a page at most and have a
good idea of what actually happened in a release, from a functionality
and user point of view.

Jeszy

On Mon, Mar 18, 2019 at 1:50 PM Quanlong Huang  wrote:
>
> Hi Gabor,
>
> IMO, the change log is too detailed to be readable for users. There're no
> any highlights. It's hard for users to find significant changes. A
> summarized doc about new features and significant bug fixes (not any fixes
> for tests) will be more useful for them. Something like these:
>  - https://kudu.apache.org/docs/release_notes.html
>  -
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_new_in_cdh_516.html
>
> However, the "Kudu 1.9.0 Release Notes" we pasted above looks like a draft.
> I think if we do so, we still need a document writer to polish and
> reorganize the doc.
>
> Thanks,
> Quanlong
>
> On Mon, Mar 18, 2019 at 8:23 PM Gabor Kaszab 
> wrote:
>
> > Hey,
> >
> > I recently had some time to give a second thought on the release notes doc
> > idea proposed above. The link Quanlong attached brings me to this doc that
> > contains a reduced list of features/bug fixes/ improvements and another
> > reduced list of the short descriptions of these:
> >
> > https://docs.google.com/document/d/1eeL4sfXvRxHvA7PcUw3SH2vA_5grb9-kRtL06Kzm5Cw/edit
> >
> > We already have an official release notes that contain a list of all the
> > changes in the release where you are one click away from reading the
> > details of a particular change.
> > https://impala.apache.org/docs/changelog-3.1.html
> >
> > All in all, currently I don't see the added value of creating another
> > release notes document. Any comments on this?
> >
> > Cheers,
> > Gabor
> >
> >
> >
> > On Mon, Mar 18, 2019 at 10:32 AM Gabor Kaszab 
> > wrote:
> >
> > > It wasn't late, no problem. I added it to the release alongside with the
> > > doc changes from Alex.
> > > I consider now the release content final and I move on with the next
> > steps
> > > of the release creation.
> > >
> > > Cheers,
> > > Gabor
> > >
> > >
> > > On Fri, Mar 15, 2019 at 8:18 PM Thomas Tauber-Marshall <
> > > tmarsh...@cloudera.com> wrote:
> > >
> > >> If its not too late, it would be great to include this fix:
> > >> https://issues.apache.org/jira/browse/IMPALA-8299
> > >>
> > >> On Thu, Mar 14, 2019 at 5:38 AM Gabor Kaszab 
> > >> wrote:
> > >>
> > >> > (Sorry, accidentally sent out the mail too early)
> > >> > - Alex sent in a number of doc commits covering changes that are
> > already
> > >> > in, so I'll include them as well:
> > >> > - 76286bf2c0320cce0eb1bf0c269d344255b4dd0e IMPALA-7974
> > >> > - 9cc75c59d5a09bd898bdf05cc64a98a70ffd IMPALA-8133
> > >> > - 535d286ee16721a35bda1861e4d011ea99a8c02f IMPALA-8067
> > >> > - 0d3d3258d2762887c61bf8e64e5df33dc2419817 IMPALA-8298
> > >> > - 668bb73e1153afcb50ffc8b45267232b926a2258 IMPALA-7974
> > >> > - ca98d6649d700517e90359d39c94d3356c6c092e IMPALA-7718
> > >> >
> > >> > Alex, I see 2 doc changes remaining according to your list. Do you
> > feel
> > >> > they can get to the repo in the next 1-2 days? I can wait for them,
> > >> then.
> > >> > IMPALA-8296
> > >> > IMPALA-8297
> > >> >
> > >> > If no other request comes in until then, then I'll take the above
> > >> mentioned
> > >> > as the content of the 3.2.0 release and advance with the next steps of
> > >> > release creation.
> > >> >
> > >> > Cheers,
> > >> > Gabor
> > >> >
> > >> >
> > >> > On Thu, Mar 14, 2019 at 1:18 PM Gabor Kaszab <
> > gaborkas...@cloudera.com>
> > >> > wrote:
> > >> >
> > >> > > Hey,
> > >> > >
> > >> > > Just for the record

Re: New PMC member: Quanlong Huang

2019-03-11 Thread Jeszy
Congrats Quanlong!

On Mon, Mar 11, 2019 at 7:57 AM Jim Apple  wrote:
>
> The Project Management Committee (PMC) for Apache Impala has invited
> Quanlong Huang to become a PMC member and we are pleased to announce
> that they have accepted.
>
> Congratulations and welcome, Quanlong!


Re: Next round of the Impala community meeting

2019-03-06 Thread Jeszy
I'd try to make it if the meeting was in a Europe-friendly time out of
interest, but I don't feel I have much to contribute - probably not
worth a reschedule by itself.

On Wed, 6 Mar 2019 at 00:21, Lars Volker  wrote:
>
> I'm good with that time too. However, I have not seen interest from folks
> in Europe, neither in this thread nor in the one where we planned the
> previous meeting.
>
> Is anyone in Europe interested in joining us?
>
> Cheers, Lars
>
> On Mon, Mar 4, 2019 at 3:56 PM Quanlong Huang 
> wrote:
>
> > +1 for Tim's suggestion since our majority are in America and Europe. What
> > about this time:
> >
> >  - Budapest (Hungary) Friday, March 15, 2019 at 4:00:00 pm CET UTC+1 hour
> >  - Beijing (China - Beijing Municipality) Friday, March 15, 2019 at
> > 11:00:00 pm CST UTC+8 hours
> >  - San Jose (USA - California) Friday, March 15, 2019 at 8:00:00 am PDT
> >
> > On Tue, Mar 5, 2019 at 6:23 AM Tim Armstrong 
> > wrote:
> >
> > > WFM. I think we should also consider a future meeting time (not this one)
> > > that will make it easy for people in Europe to join as well.
> > >
> > > On Mon, Mar 4, 2019 at 11:08 AM Lars Volker  wrote:
> > >
> > > > Hi All,
> > > >
> > > > I propose to have the next Impala community meetings on
> > > >
> > > > Thursday, March 14th, 4pm PST/San Francisco  -  Friday, March 15th, 8am
> > > > CST/Beijing.
> > > >
> > > > If that time doesn't work for you but you would like to join, please
> > > reply
> > > > to this email with a new time proposal. If there are no objections I
> > will
> > > > send out an invite in the next few days.
> > > >
> > > > Cheers, Lars
> > > >
> > >
> >


Re: Impalad JVM OOM minutes after restart

2018-08-21 Thread Jeszy
Hm, that's interesting because:
- I haven't yet seen query planning itself cause OOM
- if it was catalog related to the tables involved in the query,
following initial topic size would be bigger

Can you share diagnostic data, like the query text, definitions and
stats for tables involved, hs_err_pid written on crash, etc?
On Tue, 21 Aug 2018 at 20:32, Brock Noland  wrote:
>
> Hi Jeezy,
>
> Thanks, good tip.
>
> The MS is quite small. Even mysqldump format is only 12MB. The largest
> catalog-update I could find is only 1.5MB which should be easy to
> process with 32GB of of heap. Lastly, it's possible we can reproduce
> by running the query the impalad was processing during the issue,
> going to wait until after the users head home to try, but it doesn't
> appear reproducible in the method you describe. When we restarted, it
> did not reproduce until users started running queries.
>
> I0820 19:45:25.106437 25474 statestore.cc:568] Preparing initial
> catalog-update topic update for impalad@XXX:22000. Size = 1.45 MB
>
> Brock
>
> On Tue, Aug 21, 2018 at 1:18 PM, Jeszy  wrote:
> > Hey,
> >
> > If it happens shortly after a restart, there is a fair chance you're
> > crashing while processing the initial catalog topic update. Statestore
> > logs will tell you how big that was (it takes more memory to process
> > it than the actual size of the update).
> > If this is the case, it should also be reproducible, ie. the daemon
> > will keep restarting and running OOM on initial update until you clear
> > the metadata cache either by restarting catalog or via a (global)
> > invalidate metadata.
> >
> > HTH
> > On Tue, 21 Aug 2018 at 20:13, Brock Noland  wrote:
> >>
> >> Hi folks,
> >>
> >> I've got an Impala CDH 5.14.2 cluster with a handful of users, 2-3, at
> >> any one time. All of a sudden the JVM inside the Impalad started
> >> running out of memory.
> >>
> >> I got a heap dump, but the heap was 32GB, host is 240GB, so it's very
> >> large. Thus I wasn't able to get Memory Analyzer Tool (MAT) to open
> >> it. I was able to get JHAT to opening it when setting JHAT's heap to
> >> 160GB. It's pretty unwieldy so much of the JHAT functionality doesn't
> >> work.
> >>
> >> I am spelunking around, but really curious if there is some places I
> >> should check
> >>
> >> I am only an occasional reader of Impala source so I am just pointing
> >> out things which felt interesting:
> >>
> >> * Impalad was restarted shortly before the JVM OOM
> >> * Joining Parquet on S3 with Kudu
> >> * Only 13  instances of org.apache.impala.catalog.HdfsTable
> >> * 176836 instances of org.apache.impala.analysis.Analyzer - this feels
> >> odd to me. I remember one bug a while back in Hive when it would clone
> >> the query tree until it ran OOM.
> >> * 176796 of those _user fields point at the same user
> >> * org.apache.impala.thrift.TQueryCt@0x7f90975297f8 has 11048
> >> org.apache.impala.analysis.Analyzer@GlobalState objects pointing at
> >> it.
> >> *  There is only a single instance of
> >> org.apache.impala.thrift.TQueryCtx alive in the JVM which appears to
> >> indicate there is only a single query running. I've tracked that query
> >> down in CM. The users need to compute stats, but I don't feel that is
> >> relevant to this JVM OOM condition.
> >>
> >> Any pointers on what I might look for?
> >>
> >> Cheers,
> >> Brock


Re: Impalad JVM OOM minutes after restart

2018-08-21 Thread Jeszy
Hey,

If it happens shortly after a restart, there is a fair chance you're
crashing while processing the initial catalog topic update. Statestore
logs will tell you how big that was (it takes more memory to process
it than the actual size of the update).
If this is the case, it should also be reproducible, ie. the daemon
will keep restarting and running OOM on initial update until you clear
the metadata cache either by restarting catalog or via a (global)
invalidate metadata.

HTH
On Tue, 21 Aug 2018 at 20:13, Brock Noland  wrote:
>
> Hi folks,
>
> I've got an Impala CDH 5.14.2 cluster with a handful of users, 2-3, at
> any one time. All of a sudden the JVM inside the Impalad started
> running out of memory.
>
> I got a heap dump, but the heap was 32GB, host is 240GB, so it's very
> large. Thus I wasn't able to get Memory Analyzer Tool (MAT) to open
> it. I was able to get JHAT to opening it when setting JHAT's heap to
> 160GB. It's pretty unwieldy so much of the JHAT functionality doesn't
> work.
>
> I am spelunking around, but really curious if there is some places I
> should check
>
> I am only an occasional reader of Impala source so I am just pointing
> out things which felt interesting:
>
> * Impalad was restarted shortly before the JVM OOM
> * Joining Parquet on S3 with Kudu
> * Only 13  instances of org.apache.impala.catalog.HdfsTable
> * 176836 instances of org.apache.impala.analysis.Analyzer - this feels
> odd to me. I remember one bug a while back in Hive when it would clone
> the query tree until it ran OOM.
> * 176796 of those _user fields point at the same user
> * org.apache.impala.thrift.TQueryCt@0x7f90975297f8 has 11048
> org.apache.impala.analysis.Analyzer@GlobalState objects pointing at
> it.
> *  There is only a single instance of
> org.apache.impala.thrift.TQueryCtx alive in the JVM which appears to
> indicate there is only a single query running. I've tracked that query
> down in CM. The users need to compute stats, but I don't feel that is
> relevant to this JVM OOM condition.
>
> Any pointers on what I might look for?
>
> Cheers,
> Brock


Re: New Impala committer - Quanlong Huang

2018-08-17 Thread Jeszy
Congrats Quanlong!

On 17 August 2018 at 19:51, Csaba Ringhofer  wrote:
> Congrats!
>
> On Fri, Aug 17, 2018 at 6:32 PM, Philip Zeyliger 
> wrote:
>
>> Congrats!
>>
>> On Fri, Aug 17, 2018 at 9:29 AM Tim Armstrong 
>> wrote:
>>
>> >  The Project Management Committee (PMC) for Apache Impala has invited
>> > Quanlong Huang to become a committer and we are pleased to announce that
>> > they have accepted. Congratulations and welcome, Quanlong Huang!
>> >
>>


Re: Breaking changes after 3.0, versioning, IMPALA-3307

2018-06-11 Thread Jeszy
I think we should include it in 3.1, with the feature disabled by default
(to not break on a minor upgrade), but recommend enabling it in docs and
make it enabled by default in 4.0.

On 11 June 2018 at 10:23, Jim Apple  wrote:

> Any more thoughts? This question is for everyone in the Impala community.
>
> Right now the plan is to fold it into 3.1, with two to one in favor of that
> over bumping to 4.0.
>
> On Mon, Jun 4, 2018 at 8:48 PM Jim Apple  wrote:
>
> > I am more in favor of bumping to 4.0. It is a rapid escalation, but we
> > wouldn’t be the first open source project to switch to a model with Short
> > major versions, as both Clang and Firefox have done so.
> >
> > I also feel that, both from a semver perspective and as a user of other
> > software, I expect breaking changes to bump the major version number.
> >
> > That said, this is not a hill I’m trying to die on. My focus is on the
> > user experience, and if our users end up well informed of the breakages,
> > then I will feel we have done our job, no matter what version number we
> > stamp on it.
> >
> > On Mon, Jun 4, 2018 at 7:57 PM Philip Zeyliger 
> > wrote:
> >
> >> Hi Csaba!
> >>
> >> I would be fine with both proposals, with a slight preference to B. My
> >> understanding is that you're going to expose a way to define overrides
> for
> >> time zone definitions, so there will be pretty workable workarounds too.
> >>
> >> -- Philip
> >>
> >> On Mon, Jun 4, 2018 at 1:45 PM, Csaba Ringhofer <
> csringho...@cloudera.com
> >> >
> >> wrote:
> >>
> >> > Hi Folks!
> >> >
> >> >  We had a discussion with a few people about the versioning of Impala
> >> after
> >> > 3.0. The motivation was that IMPALA-3307 (which replaces the timezone
> >> > implementation in Impala, and contains some breaking changes) missed
> 3.0
> >> > and we are not sure about the version in which it can be released - is
> >> it
> >> > 3.1 or 4.0?
> >> >
> >> > A. jumping to 4.0 would communicate clearly that the release contains
> >> > braking changes - if the plan for Impala is to follow semantic
> >> versioning,
> >> > than this is the way to go
> >> >
> >> > B. releasing it in 3.1 would communicate that the change is too small
> >> for a
> >> > major version bump, and major versions are kept for BIG changes in
> >> Impala
> >> >
> >> > My personal preference is for B - if a breaking change is relatively
> >> small
> >> > and workarounds are possible + the community agrees, then it should be
> >> > possible to release it in minor a version, while major versions could
> be
> >> > kept for changes where switching Impala version needs large effort on
> >> the
> >> > user's side (for example 2->3 jump needs new Java and Hadoop major
> >> > version), or when a huge improvement is added to Impala which deserves
> >> > extra attention. This is more of an aesthetic than a rational choice
> on
> >> my
> >> > side, so I am totally ok with semantic versioning too, if the
> community
> >> > prefers it.
> >> >
> >>
> >
>


Re: [jira] [Created] (IMPALA-6620) Compute incremental stats for groups of partitions does not update stats correctly

2018-04-02 Thread Jeszy
It's addressed in IMPALA-5615.

On 2018. Apr 2., Mon at 7:56, Jim Apple  wrote:

> I feel like I saw a similar JIRA and patch recently. Is this addressed In
> another ticket?
>
> If not, it feels like a P2 to me: it’s not exactly incorrect, but I expect
> it means that some calls to COMPUTE STATS would decrease query performance
> in a very avoidable way.
>
> -- Forwarded message -
> From: H Milyakov (JIRA) 
> Date: Wed, Mar 7, 2018 at 4:57 AM
> Subject: [jira] [Created] (IMPALA-6620) Compute incremental stats for
> groups of partitions does not update stats correctly
> To: 
>
>
> H Milyakov created IMPALA-6620:
> --
>
>  Summary: Compute incremental stats for groups of partitions
> does not update stats correctly
>  Key: IMPALA-6620
>  URL: https://issues.apache.org/jira/browse/IMPALA-6620
>  Project: IMPALA
>   Issue Type: Bug
>   Components: Catalog
> Affects Versions: Impala 2.8.0
>  Environment: Impala - v2.8.0-cdh5.11.1
> We are using Hive Metastore Database embedded (by cloudera)
> It's postgres 8.4.20
> OS: Centos
> Reporter: H Milyakov
>
>
> Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`)
> does not compute statistics correctly (computes 0) when `partition clause`
> matches more than one partition.
>
> Executing the same command when `partition clause` matches just a single
> partition
> results in statistics being computed correctly (non 0 and non -1).
>
> The issue was observed on our production cluster for a table with 40 000
> partitions and 20 columns.
> I have copied the table to separate isolated cluster and observed the same
> behaviour.
> We use Impala 2.8.0 in Cloudera CDH 5.11
>
> The issue could be simulated with the following:
>  1. CREATE TABLE my_test_table ( some_ints BIGINT )
>  PARTITIONED BY ( part_1 BIGINT, part_2 STRING )
>  STORED AS PARQUET;
>
>  2. The only column 'some_ints' is populated so that there are 10 000
> different partitions (part_1, part_2).
>  Total number of records in the table does not matter and could be same as
> the number of different partitions.
>
>  3. Then running the compute incremental as described above simulates the
> issue.
>
>
> Did anybody faced similar issue or does have more info on the case?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


PHJ node assignment

2018-02-12 Thread Jeszy
IIUC, every row scanned in a partitioned hash join (both sides) is sent
across the network (an exchange on HASH(key)). The targets of this exchange
are nodes that have data locality with the left side of the join. Why does
Impala do it that way?

Since all rows are sent across the network anyway, Impala could just use
all the nodes in the cluster. The upside would be better parallelism for
the join itself as well as for all the operators sitting on top of it. Is
there a downside I'm forgetting?
If not, is there a jira tracking this already? Haven't found one.

Thanks!