Re: ColumnLineageGraph.java Compile Error in Frontend
Got it. Thanks! On 2017-10-04 07:55, Jeszy <jes...@gmail.com> wrote: > Hello Quanlong, > > This is https://issues.apache.org/jira/browse/IMPALA-6009, there's > already a fix (but see follow up talk on jira). > > HTH > > On 4 October 2017 at 01:53, é»æé <huangquanl...@gmail.com> wrote: > > Hi all, > > > > I encountered a compile error when I try to recompile impala yesterday. The > > error is in Frontend: > > > > [ERROR] COMPILATION ERROR : > > > > [INFO] - > > > > [ERROR] > > /mnt/volume1/impala-orc/incubator-impala/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:[593,11] > > no suitable method found for putString(java.lang.String) > > > > method > > com.google.common.hash.Hasher.putString(java.lang.CharSequence,java.nio.charset.Charset) > > is not applicable > > > > (actual and formal argument lists differ in length) > > > > method > > com.google.common.hash.PrimitiveSink.putString(java.lang.CharSequence,java.nio.charset.Charset) > > is not applicable > > > > (actual and formal argument lists differ in length) > > > > > > I also found this in the jenkins builds. It seems that > > com.google.common.hash.Hasher exists in both guava-*.jar and > > hive-exec-*.jar. Are there any changes in > > hive-exec-1.1.0-cdh5.14.0-SNAPSHOT.jar recently? What can I do to recover > > from this? > > > > > > Thanks, > > > > Quanlong >
Re: Question about the multi-thread scan node model
Got it. Thanks Tim! On 2017-09-01 00:53, Tim Armstrong <tarmstr...@cloudera.com> wrote: > I spoke to Alex Behm off-list about that JIRA a while ago. I don't think > it's a true ramp-up task. The code change is easy but I think we would want > to do performance validation and testing to make sure that the new > multithreaded scanners have similar performance and stability before making > them the default. > > On Thu, Aug 31, 2017 at 12:34 AM, huangquanl...@gmail.com < > huangquanl...@gmail.com> wrote: > > > Yeah, "compute stats" is really cpu bound. That sounds great! > > > > I noticed that one of the sub tasks of multithreading work is labeled with > > "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802 > > Is this on progress? If not, could you reassign it to me to familiar with > > the latest framework? > > > > Thanks, > > Quanlong > > > > On 2017-08-31 07:16, Tim Armstrong <tarmstr...@cloudera.com> wrote: > > > Hi, > > > The new scanner model is part of the multithreading work to support > > > running multiple instances of each fragment on each Impala daemon. The > > idea > > > there is that parallelisation is done at the fragment level so that all > > > execution including aggregations, sorts, joins is parallelised - not just > > > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work > > for > > > plans including joins and HDFS inserts. > > > > > > We find that a lot of queries are compute bound, particularly by > > > aggregations and joins. In those cases we get big speedups from the newer > > > multithreading model. E.g. "compute stats" is a lot faster. > > > > > > On Wed, Aug 30, 2017 at 3:50 PM, é»æé <huangquanl...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > > > > > Iâm working on applying our orc-support patch into the latest code > > bases ( > > > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). > > Since > > > > our > > > > patch is based on cdh-5.7.3-release which was released one year ago, > > > > thereâre lots of work to merge it. > > > > > > > > > > > > One of the biggest changes from cdh-5.7.3-release I notice is the new > > scan > > > > node & scanner model introduced in IMPALA-3902 > > > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think itâs > > inspired > > > > by the investigating task in IMPALA-2849 > > > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot > > find any > > > > performance report in this issue. Could you share some report about > > this > > > > multi-thread refactor? > > > > > > > > > > > > Iâm wondering how much this can improve the performance, since the old > > > > single thread scan node & multi-thread scanners model has supplied > > > > concurrent IO for reading, and most of the queries in OLAP are IO > > bound. > > > > > > > > > > > > Thanks, > > > > > > > > Quanlong > > > > > > > > > >
Re: Question about the multi-thread scan node model
Yeah, "compute stats" is really cpu bound. That sounds great! I noticed that one of the sub tasks of multithreading work is labeled with "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802 Is this on progress? If not, could you reassign it to me to familiar with the latest framework? Thanks, Quanlong On 2017-08-31 07:16, Tim Armstrong <tarmstr...@cloudera.com> wrote: > Hi, > The new scanner model is part of the multithreading work to support > running multiple instances of each fragment on each Impala daemon. The idea > there is that parallelisation is done at the fragment level so that all > execution including aggregations, sorts, joins is parallelised - not just > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work for > plans including joins and HDFS inserts. > > We find that a lot of queries are compute bound, particularly by > aggregations and joins. In those cases we get big speedups from the newer > multithreading model. E.g. "compute stats" is a lot faster. > > On Wed, Aug 30, 2017 at 3:50 PM, é»æé <huangquanl...@gmail.com> wrote: > > > Hi all, > > > > > > Iâm working on applying our orc-support patch into the latest code bases ( > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). Since > > our > > patch is based on cdh-5.7.3-release which was released one year ago, > > thereâre lots of work to merge it. > > > > > > One of the biggest changes from cdh-5.7.3-release I notice is the new scan > > node & scanner model introduced in IMPALA-3902 > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think itâs inspired > > by the investigating task in IMPALA-2849 > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot find any > > performance report in this issue. Could you share some report about this > > multi-thread refactor? > > > > > > Iâm wondering how much this can improve the performance, since the old > > single thread scan node & multi-thread scanners model has supplied > > concurrent IO for reading, and most of the queries in OLAP are IO bound. > > > > > > Thanks, > > > > Quanlong > > >