Re: ColumnLineageGraph.java Compile Error in Frontend

2017-10-03 Thread huangquanl...@gmail.com
Got it. Thanks!

On 2017-10-04 07:55, Jeszy <jes...@gmail.com> wrote: 
> Hello Quanlong,
> 
> This is https://issues.apache.org/jira/browse/IMPALA-6009, there's
> already a fix (but see follow up talk on jira).
> 
> HTH
> 
> On 4 October 2017 at 01:53, 黄权隆 <huangquanl...@gmail.com> wrote:
> > Hi all,
> >
> > I encountered a compile error when I try to recompile impala yesterday. The
> > error is in Frontend:
> >
> > [ERROR] COMPILATION ERROR :
> >
> > [INFO] -
> >
> > [ERROR]
> > /mnt/volume1/impala-orc/incubator-impala/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:[593,11]
> > no suitable method found for putString(java.lang.String)
> >
> > method
> > com.google.common.hash.Hasher.putString(java.lang.CharSequence,java.nio.charset.Charset)
> > is not applicable
> >
> >   (actual and formal argument lists differ in length)
> >
> > method
> > com.google.common.hash.PrimitiveSink.putString(java.lang.CharSequence,java.nio.charset.Charset)
> > is not applicable
> >
> >   (actual and formal argument lists differ in length)
> >
> >
> > I also found this in the jenkins builds. It seems that
> > com.google.common.hash.Hasher exists in both guava-*.jar and
> > hive-exec-*.jar. Are there any changes in
> > hive-exec-1.1.0-cdh5.14.0-SNAPSHOT.jar recently? What can I do to recover
> > from this?
> >
> >
> > Thanks,
> >
> > Quanlong
> 


Re: Question about the multi-thread scan node model

2017-09-03 Thread huangquanl...@gmail.com
Got it. Thanks Tim!

On 2017-09-01 00:53, Tim Armstrong <tarmstr...@cloudera.com> wrote: 
> I spoke to Alex Behm off-list about that JIRA a while ago. I don't think
> it's a true ramp-up task. The code change is easy but I think we would want
> to do performance validation and testing to make sure that the new
> multithreaded scanners have similar performance and stability before making
> them the default.
> 
> On Thu, Aug 31, 2017 at 12:34 AM, huangquanl...@gmail.com <
> huangquanl...@gmail.com> wrote:
> 
> > Yeah, "compute stats" is really cpu bound. That sounds great!
> >
> > I noticed that one of the sub tasks of multithreading work is labeled with
> > "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802
> > Is this on progress? If not, could you reassign it to me to familiar with
> > the latest framework?
> >
> > Thanks,
> > Quanlong
> >
> > On 2017-08-31 07:16, Tim Armstrong <tarmstr...@cloudera.com> wrote:
> > > Hi,
> > >   The new scanner model is part of the multithreading work to support
> > > running multiple instances of each fragment on each Impala daemon. The
> > idea
> > > there is that parallelisation is done at the fragment level so that all
> > > execution including aggregations, sorts, joins is parallelised - not just
> > > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work
> > for
> > > plans including joins and HDFS inserts.
> > >
> > > We find that a lot of queries are compute bound, particularly by
> > > aggregations and joins. In those cases we get big speedups from the newer
> > > multithreading model. E.g. "compute stats" is a lot faster.
> > >
> > > On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆 <huangquanl...@gmail.com> 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > >
> > > > I’m working on applying our orc-support patch into the latest code
> > bases (
> > > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>).
> > Since
> > > > our
> > > > patch is based on cdh-5.7.3-release which was released one year ago,
> > > > there’re lots of work to merge it.
> > > >
> > > >
> > > > One of the biggest changes from cdh-5.7.3-release I notice is the new
> > scan
> > > > node & scanner model introduced in IMPALA-3902
> > > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s
> > inspired
> > > > by the investigating task in IMPALA-2849
> > > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot
> > find any
> > > > performance report in this issue. Could you share some report about
> > this
> > > > multi-thread refactor?
> > > >
> > > >
> > > > I’m wondering how much this can improve the performance, since the old
> > > > single thread scan node & multi-thread scanners model has supplied
> > > > concurrent IO for reading, and most of the queries in OLAP are IO
> > bound.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Quanlong
> > > >
> > >
> >
> 


Re: Question about the multi-thread scan node model

2017-08-31 Thread huangquanl...@gmail.com
Yeah, "compute stats" is really cpu bound. That sounds great!

I noticed that one of the sub tasks of multithreading work is labeled with 
"ramp up": https://issues.apache.org/jira/browse/IMPALA-5802
Is this on progress? If not, could you reassign it to me to familiar with the 
latest framework?

Thanks,
Quanlong

On 2017-08-31 07:16, Tim Armstrong <tarmstr...@cloudera.com> wrote: 
> Hi,
>   The new scanner model is part of the multithreading work to support
> running multiple instances of each fragment on each Impala daemon. The idea
> there is that parallelisation is done at the fragment level so that all
> execution including aggregations, sorts, joins is parallelised - not just
> scans. This is enabled by setting mt_dop > 0. Currently it doesn't work for
> plans including joins and HDFS inserts.
> 
> We find that a lot of queries are compute bound, particularly by
> aggregations and joins. In those cases we get big speedups from the newer
> multithreading model. E.g. "compute stats" is a lot faster.
> 
> On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆 <huangquanl...@gmail.com> wrote:
> 
> > Hi all,
> >
> >
> > I’m working on applying our orc-support patch into the latest code bases (
> > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). Since
> > our
> > patch is based on cdh-5.7.3-release which was released one year ago,
> > there’re lots of work to merge it.
> >
> >
> > One of the biggest changes from cdh-5.7.3-release I notice is the new scan
> > node & scanner model introduced in IMPALA-3902
> > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s inspired
> > by the investigating task in IMPALA-2849
> > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot find any
> > performance report in this issue. Could you share some report about this
> > multi-thread refactor?
> >
> >
> > I’m wondering how much this can improve the performance, since the old
> > single thread scan node & multi-thread scanners model has supplied
> > concurrent IO for reading, and most of the queries in OLAP are IO bound.
> >
> >
> > Thanks,
> >
> > Quanlong
> >
>