There should be another stack trace in drillbit.out when this happens, could you please check that file.
On Mon, Jun 20, 2016 at 8:14 PM, qiang li <[email protected]> wrote: > Another issue is some time when I restart the node, the node can not be > startup. > > Here is the exception. > ache-drill-1.7.0/jars/drill-gis-1.7.0-SNAPSHOT.jar!/, > jar:file:/usr/lib/apache-drill-1.7.0/jars/drill-memory-base-1.7.0-SNAPSHOT.jar!/] > took 2800ms > 2016-06-20 19:10:18,313 [main] INFO o.a.d.e.s.s.PersistentStoreRegistry - > Using the configured PStoreProvider class: > 'org.apache.drill.exec.store.sys.store.provider.ZookeeperPersistentStoreProvider'. > 2016-06-20 19:10:19,221 [main] INFO o.apache.drill.exec.server.Drillbit - > Construction completed (1529 ms). > 2016-06-20 19:10:31,136 [main] WARN o.apache.drill.exec.server.Drillbit - > Failure on close() > java.lang.NullPointerException: null > at > org.apache.drill.exec.work.WorkManager.close(WorkManager.java:153) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:159) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:293) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > 2016-06-20 19:10:31,137 [main] INFO o.apache.drill.exec.server.Drillbit - > Shutdown completed (1914 ms). > > I did nothing and start it at next day, then it can startup. > > 2016-06-21 9:48 GMT+08:00 qiang li <[email protected]>: > >> Hi Aman, >> >> I did not fully test with the old version. >> >> Cloud you please help me create the JIRA issue, I think my account have >> not the privilege, my account is griffinli and can not find the place to >> create new issue. Below is the explain detail for the same SQL in different >> nodes of cluster. >> >> >> This is the correct plan which only have two nodes: >> 0: jdbc:drill:zk=xxx:> explain plan for select >> CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, >> convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as >> `nation` join hbase.offers_ref0 as `ref0` on >> BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key >> > '0br' and `nation`.row_key < '0bs' limit 10; >> +------+------+ >> | text | json | >> +------+------+ >> | 00-00 Screen >> 00-01 Project(uid=[$0], v=[$1]) >> 00-02 SelectionVectorRemover >> 00-03 Limit(fetch=[10]) >> 00-04 UnionExchange >> 01-01 SelectionVectorRemover >> 01-02 Limit(fetch=[10]) >> 01-03 Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3, >> -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))]) >> 01-04 Project(row_key=[$3], v=[$4], ITEM=[$5], >> row_key0=[$0], v0=[$1], $f2=[$2]) >> 01-05 HashJoin(condition=[=($2, $5)], >> joinType=[inner]) >> 01-07 Project(row_key=[$0], v=[$1], >> $f2=[BYTE_SUBSTR($0, -8, 8)]) >> 01-09 Scan(groupscan=[HBaseGroupScan >> [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null, >> stopRow=null, filter=null], columns=[`*`]]]) >> 01-06 Project(row_key0=[$0], v0=[$1], ITEM=[$2]) >> 01-08 *BroadcastExchange* >> 02-01 Project(row_key=[$0], v=[$1], >> ITEM=[ITEM($1, 'v')]) >> 02-02 Scan(groupscan=[HBaseGroupScan >> [HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx, >> startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter >> (GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`, >> `v`.`v`]]]) >> >> >> This is the plan that fails which have more than 5 nodes: >> 0: jdbc:drill:zk=xxx:> explain plan for select >> CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, >> convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as >> `nation` join hbase.offers_ref0 as `ref0` on >> BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key >> > '0br' and `nation`.row_key < '0bs' limit 10; >> +------+------+ >> | text | json | >> +------+------+ >> | 00-00 Screen >> 00-01 Project(uid=[$0], v=[$1]) >> 00-02 SelectionVectorRemover >> 00-03 Limit(fetch=[10]) >> 00-04 UnionExchange >> 01-01 SelectionVectorRemover >> 01-02 Limit(fetch=[10]) >> 01-03 Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3, >> -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))]) >> 01-04 Project(row_key=[$3], v=[$4], ITEM=[$5], >> row_key0=[$0], v0=[$1], $f2=[$2]) >> 01-05 HashJoin(condition=[=($2, $5)], >> joinType=[inner]) >> 01-07 Project(row_key=[$0], v=[$1], $f2=[$2]) >> 01-09 *HashToRandomExchange*(dist0=[[$2]]) >> 02-01 UnorderedMuxExchange >> 04-01 Project(row_key=[$0], v=[$1], >> $f2=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)]) >> 04-02 Project(row_key=[$0], v=[$1], >> $f2=[BYTE_SUBSTR($0, -8, 8)]) >> 04-03 Scan(groupscan=[HBaseGroupScan >> [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null, >> stopRow=null, filter=null], columns=[`*`]]]) >> 01-06 Project(row_key0=[$0], v0=[$1], ITEM=[$2]) >> 01-08 Project(row_key=[$0], v=[$1], ITEM=[$2]) >> 01-10 *HashToRandomExchange*(dist0=[[$2]]) >> 03-01 UnorderedMuxExchange >> 05-01 Project(row_key=[$0], v=[$1], >> ITEM=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)]) >> 05-02 Project(row_key=[$0], v=[$1], >> ITEM=[ITEM($1, 'v')]) >> 05-03 Scan(groupscan=[HBaseGroupScan >> [HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx, >> startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter >> (GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`, >> `v`.`v`]]]) >> >> The difference is use *BroadcastExchange *and *HashToRandomExchange.* >> >> You can create the JIRA and send me the link . >> >> Thanks. >> >> >> 2016-06-20 23:44 GMT+08:00 Aman Sinha <[email protected]>: >> >>> Hi Qiang, >>> were you seeing this same issue with the prior HBase version also ? (I >>> would think this is not a regression). It would be best to create a new >>> JIRA and attach the EXPLAIN plans for the successful and failed runs. >>> With >>> more nodes some minor fragments of the hash join may be getting empty >>> input >>> batches and I am guessing that has something to do with the >>> SchemaChangeException. Someone would need to debug once you create the >>> JIRA with relevant details. >>> >>> -Aman >>> >>> On Mon, Jun 20, 2016 at 5:13 AM, qiang li <[email protected]> wrote: >>> >>> > Thanks Aditya. >>> > >>> > By the way, I found another issue. >>> > >>> > Let say I have two tables. >>> > >>> > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, >>> qualifier: >>> > v(string) >>> > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: >>> v(long >>> > 8 byte) >>> > >>> > there is the SQL: >>> > >>> > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as >>> uid, >>> > convert_from(`ref0`.`v`.`v`,'UTF8') as v from >>> hbase.`offers_nation_idx` as >>> > `nation` join hbase.offers_ref0 as `ref0` on >>> > CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = >>> > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > >>> '0br' >>> > and `nation`.row_key < '0bs' limit 10 >>> > >>> > When I execute the query with single node or less than 5 nodes, its >>> working >>> > good. But when I execute it in cluster which have about 14 nodes, its >>> throw >>> > a exception: >>> > >>> > First time will throw this exception: >>> > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: >>> > Hash join does not support schema changes* >>> > >>> > Then if I query again, it will always throw below exception: >>> > *Query Failed: An Error Occurred* >>> > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: >>> > IllegalStateException: Failure while reading vector. Expected vector >>> class >>> > of org.apache.drill.exec.vector.NullableIntVector but was holding >>> vector >>> > class org.apache.drill.exec.vector.complex.MapVector, field= >>> > v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), >>> > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 >>> [Error Id: >>> > 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* >>> > >>> > Its very strange, and I do not know how to solve it. >>> > I tried add node to the cluster one by one, it will reproduce when I >>> added >>> > 5 nodes. Can anyone help me solve this issue? >>> > >>> > >>> > >>> > >>> > 2016-06-17 4:39 GMT+08:00 Aditya <[email protected]>: >>> > >>> > > https://issues.apache.org/jira/browse/DRILL-4727 >>> > > >>> > > On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha <[email protected]> >>> > wrote: >>> > > >>> > >> Qiang/Aditya can you create a JIRA for this and mark it for 1.7. >>> > thanks. >>> > >> >>> > >> On Thu, Jun 16, 2016 at 11:25 AM, Aditya <[email protected]> >>> > wrote: >>> > >> >>> > >> > Thanks for reporting, I'm looking into it and will post a patch >>> soon. >>> > >> > >>> > >> > On Wed, Jun 15, 2016 at 7:27 PM, qiang li <[email protected]> >>> > wrote: >>> > >> > >>> > >> > > Hi Aditya, >>> > >> > > >>> > >> > > I tested the latest version and got this exception and the >>> drillbit >>> > >> fail >>> > >> > > to startup . >>> > >> > > >>> > >> > > Exception in thread "main" java.lang.NoSuchMethodError: >>> > >> > > io.netty.util.UniqueName.<init>(Ljava/lang/String;)V >>> > >> > > at >>> > >> io.netty.channel.ChannelOption.<init>(ChannelOption.java:136) >>> > >> > > at >>> > >> io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99) >>> > >> > > at >>> > >> io.netty.channel.ChannelOption.<clinit>(ChannelOption.java:42) >>> > >> > > at >>> > >> > > >>> org.apache.drill.exec.rpc.BasicServer.<init>(BasicServer.java:63) >>> > >> > > at >>> > >> > > >>> org.apache.drill.exec.rpc.user.UserServer.<init>(UserServer.java:74) >>> > >> > > at >>> > >> > > >>> > >> >>> > >>> org.apache.drill.exec.service.ServiceEngine.<init>(ServiceEngine.java:78) >>> > >> > > at >>> > >> > org.apache.drill.exec.server.Drillbit.<init>(Drillbit.java:108) >>> > >> > > at >>> > >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285) >>> > >> > > at >>> > >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271) >>> > >> > > at >>> > >> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267) >>> > >> > > >>> > >> > > It will working if I remove >>> > jars/3rdparty/netty-all-4.0.23.Final.jar, >>> > >> the >>> > >> > > drill can startup. I think there have some package dependency >>> > version >>> > >> > > issue, do you think so ? >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > > 2016-06-15 8:14 GMT+08:00 Aditya <[email protected]>: >>> > >> > > >>> > >> > >> HBase 1.x support has been merged and is available in latest >>> > >> > >> 1.7.0-SNAPSHOT >>> > >> > >> builds. >>> > >> > >> >>> > >> > >> On Wed, Jun 1, 2016 at 1:23 PM, Aditya < >>> [email protected]> >>> > >> wrote: >>> > >> > >> >>> > >> > >> > Thanks Jacques for promptly reviewing my long series of >>> patches! >>> > >> > >> > >>> > >> > >> > I'm planning to merge the HBase 1.x support some time in >>> next 48 >>> > >> > hours. >>> > >> > >> > >>> > >> > >> > If anyone else is interested and willing, please review the >>> > latest >>> > >> > patch >>> > >> > >> > here[1]. >>> > >> > >> > >>> > >> > >> > aditya... >>> > >> > >> > >>> > >> > >> > [1] https://github.com/apache/drill/pull/443/files >>> > >> > >> > >>> > >> > >> >>> > >> > > >>> > >> > > >>> > >> > >>> > >> >>> > > >>> > > >>> > >>> >> >> >
