Re: Drill's Future
Hi, I appreciate the work that this community have put into Drill. I am using it regularly and promoting it's use. I can help with testing and documentation. Regards, Herman Tan On Sun, 17 May 2020, 9:56 pm Charles Givre, wrote: > Hey Bob, > Great to hear from you and thanks for volunteering. Between you and me, > I've been considering forking Drill, moving it to the OWASP foundation, and > integrating a lot of security-related stuff that either hasn't been > accepted or I haven't submitted. I suspect this would help increase Drill's > user base and hence get better features etc. > > What do you think? > -- C > > > > > On May 15, 2020, at 6:31 PM, Bob Rudis wrote: > > > > I can carve out work-time to review PRs. > > > > -boB > > > > On Fri, May 15, 2020 at 11:25 AM Prabhakar Bhosaale > > wrote: > >> > >> HI Charles, > >> I have never contributed to any open source community before this. Also > I > >> am not expert on JAVA but willing to contribute for sure. Please let me > >> know in what capacity i can be helpful here. > >> > >> Also, I would like to know if there any efforts going on to identify > >> primary support for Apache drill and what is its future in case there > is no > >> primary supporter? My organization is planning to build some tools using > >> drill and this news will definitely impact the same. thx > >> > >> Regards > >> Prabhakar > >> > >> On Fri, May 15, 2020 at 8:18 PM Charles Givre wrote: > >> > >>> Dear Drill Users, > >>> I hope everyone weathering the COVID storm and staying safe. > >>> > >>> As you may or may not be aware, I found out this week that MapR's > >>> successor, HPE has for all intents and purposes ended its support for > >>> Drill. Whilst they will be contributing minimal bug fixes, they will > not > >>> be adding any new functionality or new features. This leaves the open > >>> source Drill without a primary backer. Under the Apache rules, to > commit > >>> code to the code base it must be reviewed and voted upon. While we have > >>> committers, we do not currently have enough code reviewers to review > new > >>> commits. > >>> > >>> My question to you as a Drill user is whether or not Drill is offering > >>> value to you and your organization. If it is, and you would like to > see > >>> Drill development continue, then we need additional volunteers to step > up > >>> and review pull requests. > >>> > >>> If you're interested in reviewing code for Drill, please let me know. > >>> > >>> Thank you for your continued support and stay safe! > >>> -- Charles > >
Re: Need help with Drill SQL for a json file
Dear Drill, My mistake, I missed "Response" in the first field. Apologise for this careless mistake. Regards, Herman On Mon, May 18, 2020 at 10:35 AM Herman Tan wrote: > Hi Drill community, > > I need help in making a SQL Query to extract from attached JSON file. > > How do I create a sql select to get Location/Address/Label and the > Location/DisplayPosition/Latitude and Longitude fields in 3 columns? > > I have been trying all different ways but I have been getting NULLs. > > Your help will be appreciated. > > Regards, > Herman Tan > > One record: > - > { > "MetaInfo":{"Timestamp":"2020-05-17T08:18:09.892+"}, > "View":[{ >"_type":"SearchResultsViewType", >"ViewId":0, >"Result":[{ > "Relevance":0.8, > "MatchLevel":"street", > "MatchQuality":{ > "Country":1.0, > "City":1.0, > "District":1.0, > "Street":[0.85], > "PostalCode":1.0 > }, > "Location":{ > "LocationId":"NT_PSJYsiXVY9dOfsNAh-g6bC", > "LocationType":"point", > > "DisplayPosition":{"Latitude":-8.14916,"Longitude":113.71817}, > > "NavigationPosition":[{"Latitude":-8.14916,"Longitude":113.71817}], > "MapView":{ > > "TopLeft":{"Latitude":-8.14823,"Longitude":113.71793}, > > "BottomRight":{"Latitude":-8.14943,"Longitude":113.71926} > }, > "Address":{ > "Label":"Jalan Perumahan > Pesona Regency, Jember 68111, Indonesia", > "Country":"IDN", > "County":"Jawa Timur", > "City":"Jember", > "District":"Patrang", > "Subdistrict":"Patrang Kel.", > "Street":"Jalan Perumahan > Pesona Regency", > "PostalCode":"68111", > > "AdditionalData":[{"value":"Indonesia","key":"CountryName"},{"value":"Jawa > Timur","key":"CountyName"}]} > }, > "AdditionalData":[] > }] > }] > } > - > > >
Need help with Drill SQL for a json file
Hi Drill community, I need help in making a SQL Query to extract from attached JSON file. How do I create a sql select to get Location/Address/Label and the Location/DisplayPosition/Latitude and Longitude fields in 3 columns? I have been trying all different ways but I have been getting NULLs. Your help will be appreciated. Regards, Herman Tan One record: - { "MetaInfo":{"Timestamp":"2020-05-17T08:18:09.892+"}, "View":[{ "_type":"SearchResultsViewType", "ViewId":0, "Result":[{ "Relevance":0.8, "MatchLevel":"street", "MatchQuality":{ "Country":1.0, "City":1.0, "District":1.0, "Street":[0.85], "PostalCode":1.0 }, "Location":{ "LocationId":"NT_PSJYsiXVY9dOfsNAh-g6bC", "LocationType":"point", "DisplayPosition":{"Latitude":-8.14916,"Longitude":113.71817}, "NavigationPosition":[{"Latitude":-8.14916,"Longitude":113.71817}], "MapView":{ "TopLeft":{"Latitude":-8.14823,"Longitude":113.71793}, "BottomRight":{"Latitude":-8.14943,"Longitude":113.71926} }, "Address":{ "Label":"Jalan Perumahan Pesona Regency, Jember 68111, Indonesia", "Country":"IDN", "County":"Jawa Timur", "City":"Jember", "District":"Patrang", "Subdistrict":"Patrang Kel.", "Street":"Jalan Perumahan Pesona Regency", "PostalCode":"68111", "AdditionalData":[{"value":"Indonesia","key":"CountryName"},{"value":"Jawa Timur","key":"CountyName"}]} }, "AdditionalData":[] }] }] } - here_api_20200517_162435.json Description: application/json
Re: JDBC datasource on Websphere server 8.5.5.9
Sorry, please ignore. On Mon, Mar 30, 2020 at 1:55 PM Herman Tan wrote: > Igor, > > You are a drill user? > That's great! > Our paths has crossed again. > > Herman > > On Tue, Mar 24, 2020 at 9:28 PM Igor Guzenko > wrote: > >> Hello Prabhakar, >> >> Seems like there is a similar question on >> >> https://stackoverflow.com/questions/1677722/websphere-application-server-data-source >> and >> easiest path is to uncheck enable2Phase somewhere in WebSphere driver >> settings. >> Or harder way is that you can try to make a small project just to wrap >> Drill driver with some of JDBC connection pools like HikariCP. >> >> Thanks, >> Igor >> >> On Tue, Mar 24, 2020 at 2:49 PM Charles Givre wrote: >> >> > HI Prabhakar, >> > Thanks for your interest in Drill. Can you share your config info as >> well >> > as the versions of Drill and JDBC Driver that you are using? >> > Thanks, >> > -- C >> > >> > >> > > On Mar 24, 2020, at 7:07 AM, Prabhakar Bhosaale < >> bhosale@gmail.com> >> > wrote: >> > > >> > > Hi Team, >> > > >> > > we are trying to connect to apache drill from websphere 8.5.5.9. We >> > created the the Data provider and data source as per standard process of >> > WAS. But when we try to test the connection, it gives following error. >> > > >> > > "Test connection operation failed for data source retrievalds on >> server >> > ARCHIVE_SERVER at node ARCHIVALPROFILENode1 with the following >> exception: >> > java.lang.Exception: DSRA8101E: DataSource class cannot be used as >> > one-phase: ClassCastException: org.apache.drill.jdbc.Driver incompatible >> > with javax.sql.ConnectionPoolDataSource " >> > > >> > > We are using SDK version 1.8 >> > > Attaching the JVM log also for your reference. thx >> > > >> > > Any pointers or any documentation in this regards would be >> appreciated. >> > Please help. thx >> > > >> > > Regards >> > > Prabhakar >> > > >> > >> > >> >
Re: JDBC datasource on Websphere server 8.5.5.9
Igor, You are a drill user? That's great! Our paths has crossed again. Herman On Tue, Mar 24, 2020 at 9:28 PM Igor Guzenko wrote: > Hello Prabhakar, > > Seems like there is a similar question on > > https://stackoverflow.com/questions/1677722/websphere-application-server-data-source > and > easiest path is to uncheck enable2Phase somewhere in WebSphere driver > settings. > Or harder way is that you can try to make a small project just to wrap > Drill driver with some of JDBC connection pools like HikariCP. > > Thanks, > Igor > > On Tue, Mar 24, 2020 at 2:49 PM Charles Givre wrote: > > > HI Prabhakar, > > Thanks for your interest in Drill. Can you share your config info as > well > > as the versions of Drill and JDBC Driver that you are using? > > Thanks, > > -- C > > > > > > > On Mar 24, 2020, at 7:07 AM, Prabhakar Bhosaale > > > wrote: > > > > > > Hi Team, > > > > > > we are trying to connect to apache drill from websphere 8.5.5.9. We > > created the the Data provider and data source as per standard process of > > WAS. But when we try to test the connection, it gives following error. > > > > > > "Test connection operation failed for data source retrievalds on server > > ARCHIVE_SERVER at node ARCHIVALPROFILENode1 with the following exception: > > java.lang.Exception: DSRA8101E: DataSource class cannot be used as > > one-phase: ClassCastException: org.apache.drill.jdbc.Driver incompatible > > with javax.sql.ConnectionPoolDataSource " > > > > > > We are using SDK version 1.8 > > > Attaching the JVM log also for your reference. thx > > > > > > Any pointers or any documentation in this regards would be appreciated. > > Please help. thx > > > > > > Regards > > > Prabhakar > > > > > > > >
Re: Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.
On Sun, 9 Dec 2018, 2:39 AM Arina Yelchiyeva Yes, this option helps but is not recommended for the production, only for > testing purposes. > > Kind regards, > Arina > > > On Dec 8, 2018, at 8:08 PM, Alaa Salmo wrote: > > > > I found the solution. When we run drill & Zookeeper on the same > machine(we > > run drill under distributed mode), we need to add > > allow_loopback_address_binding under drill.exec in Drill configuration > file > > (drill-override.conf). We need to set allow_loopback_address_binding to > > true. > > > > > > > > On Sat, Dec 8, 2018 at 12:22 PM Arina Yelchiyeva < > arina.yelchiy...@gmail.com> > > wrote: > > > >> Please refer to this answer on the stackoverflow: > >> > >> > >> > https://stackoverflow.com/questions/40506221/how-to-start-drillbit-locally-in-distributed-mode > >> > >>> On Dec 8, 2018, at 2:10 AM, Alaa Salmo wrote: > >>> > >>> Hi All, > >>> > >>> I am trying to install Hadoop/Zookeeper with Apache Drill (distributed > >>> mode). I am facing the following error > >>> "org.apache.drill.exec.exception.DrillbitStartupException: > >>> Drillbit is disallowed to bind to loopback address in distributed > >> mode.". I > >>> tried to google, the answer was change drill-override.conf. I am still > >>> facing the same problem. Did anybody has same issue? Does Drill work in > >> the > >>> local machine with Zookeeper? > >>> > >>> drill.exec: { > >>> cluster-id: "drillbits1", > >>> zk.connect: "mgelbana-machine:2181" > >>> } > >>> > >>> OR > >>> > >>> drill.exec: { > >>> cluster-id: "drillbits1", > >>> zk.connect: "127.0.0.1:2181" > >>> } > >>> > >>> > >>> > On Sat, Dec 1, 2018 at 9:09 AM Alaa Salmo > wrote: > > Hi, > > I am installing Drill on my virtual machine (Centos). I am facing one > issue. I tried to google but I could not solve it > > bin/drillbit.sh start > cat /usr/local/apache-drill-1.14.0/log/drillbit.out > > > >> > *** > Exception in thread "main" > org.apache.drill.exec.exception.DrillbitStartupException: Failure > during > initial startup of Drillbit. > at > org.apache.drill.exec.server.Drillbit.start(Drillbit.java:428) > at > org.apache.drill.exec.server.Drillbit.start(Drillbit.java:394) > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:390) > Caused by: org.apache.drill.exec.exception.DrillbitStartupException: > Drillbit is disallowed to bind to loopback address in distributed > mode. > at > > org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:97) > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:191) > at > org.apache.drill.exec.server.Drillbit.start(Drillbit.java:424) > > > >> > *** > > - I tried to chnage the hostname but I still facing same issue > - I tried to change Java 8 JDK from OpenJDK to Oracle but I still > facing > same issue > > Please can you provide me the right answer? > > Regards, > Alaa > > > >> > >> >
Re: ERROR is reading parquet data after create table
Hi, I am running into this problem again. The solution is to define a view that is a "union all" of individual select queries of single csv files. Can I request that the hardcoded timeout values be parametrized please? Herman On Tue, Oct 2, 2018 at 7:29 PM Herman Tan wrote: > Hi, > > I found the hard coded parameter in the source code > > > https://github.com/apache/drill/blob/8edeb49873d1a1710cfe28e0b49364d07eb1aef4/exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java > > LINE 52 : private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000; > LINE 210 :timeout = TIMEOUT_PER_RUNNABLE_IN_MSECS * ((tasks.size() - > 1)/parallelism + 1); > > parallelism param is also hardcoded > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java > LINE 343: 16 > > In my case: task size is 29, 15000 * ((29-1)/(16+1)) = 15000 * 2 = 3 > 16 runnables executes round robbin on 29 tasks. Each runnable given 3 > ms to timeout. > > This is the error message: > Waited for 3 ms, but only 11 tasks for 'Fetch parquet metadata' are > complete. Total number of tasks 29, parallelism 16. > > TimedCallable.java: > LINE 248:final String errMsg = String.format("Waited for %d ms, but > only %d tasks for '%s' are complete." + > " Total number of tasks %d, parallelism %d.", timeout, > futureMapper.count, activity, tasks.size(), parallelism); > > Shouldn't these be parameterized in "options" based on the infrastructure? > > Regards, > Herman > > > > > On Tue, Oct 2, 2018 at 6:47 PM Herman Tan wrote: > >> Hi, >> >> I have restarted drill and run the script again. >> >> select * from dfs.tmp.`load_pos_sales_detail_tbl`; >> -- SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 11 tasks for >> 'Fetch parquet metadata' are complete. Total number of tasks 29, >> parallelism 16. >> >> The 29 tasks is related to the 29 parquet files in the folder. >> To check if any of the parquet files has an error, I ran the following >> SQL on each parquet file in the folder. ALL PASSED. (SQL Below). >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; >> >> So it seems that for this table drill can only get the metadata for 11 >> parquet files before it times out. >> The time-out is a calculation and it varies from size of table. >> I checked the source code but I cannot find the calculation of the >> timeout of "3 ms". >> When I am lucky, drill can resolve the metadata for 29 files in 3 ms >> and it passes. >> >> I plan to use drill for production but this bothers me that there is a >> limit on the number of parquet files and the timeout parameter cannot be >> tuned. >> >> Does anyone have any ideas? >> >> Regards, >> Herman >> -- SQL BELOW - >> >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_14_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_1_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_2_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_3_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_0.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_1.parquet`; >> select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_0.parquet`; >&
Re: ERROR is reading parquet data after create table
I am running in single instance embedded mode under windows. On Thu, 11 Oct 2018, 3:26 PM Divya Gehlot, wrote: > Hi , > Can somebody clarify on the number of tasks, What I understood from Herman > is if you have 29 parquet files than Drill actually creates 29 tasks . > > Herman , Can I know you are running drill on embedded mode or distributed > mode. > > I am running Drill in production for multiple sources and I do have many( > like 2 years worth of data) parquet files and never encountered this issue > . > Yeah at times when I have parquet in multi directory hierarchy or parquet > files are small I do get either time out or query is too slow . > > Thoughts please? > > Thanks, > Divya > > On Tue, 2 Oct 2018 at 18:48, Herman Tan wrote: > > > Hi, > > > > I have restarted drill and run the script again. > > > > select * from dfs.tmp.`load_pos_sales_detail_tbl`; > > -- SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 11 tasks for > > 'Fetch parquet metadata' are complete. Total number of tasks 29, > > parallelism 16. > > > > The 29 tasks is related to the 29 parquet files in the folder. > > To check if any of the parquet files has an error, I ran the following > SQL > > on each parquet file in the folder. ALL PASSED. (SQL Below). > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; > > > > So it seems that for this table drill can only get the metadata for 11 > > parquet files before it times out. > > The time-out is a calculation and it varies from size of table. > > I checked the source code but I cannot find the calculation of the > timeout > > of "3 ms". > > When I am lucky, drill can resolve the metadata for 29 files in 3 ms > > and it passes. > > > > I plan to use drill for production but this bothers me that there is a > > limit on the number of parquet files and the timeout parameter cannot be > > tuned. > > > > Does anyone have any ideas? > > > > Regards, > > Herman > > -- SQL BELOW - > > > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_14_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_1_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_2_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_3_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_8_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_8_1.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_9_0.parquet`; > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_9_1.parquet`; > > > > > > On Tue, Oct 2, 2018 at 4:44 PM Herman Tan wrote: > > > > > Hi Divya and everyone, > > > > > > The problem has disappeared. > > > Drill was not restarted. > > > This appears to be intermittent. > > > Before I submitted the error report, I ran the script several times and > > it > > > failed all the time. > > > Today I ran it again and it succeeded. > > > I will restart and test again. > > > > > > Regards,
Re: to_date() string to date conversion ERROR
By the way, I was looking for how to set user.timezone option. https://mapr.com/support/s/article/Understanding-Drill-s-timestamp-and-timezone?language=en_US It says sys.boot option user.timezone, set to UTC It does not exist anymore https://mapr.com/support/s/article/MapR-DB-JSON-application-Error-Illegal-instant-due-to-time-zone-offset-transition?language=en_US I followed the instructions here. Add -Duser.timezone=UTC in DRILL_JAVA_OPTS sqlline.bat Regards, Herman On Wed, Oct 3, 2018 at 11:34 PM Herman Tan wrote: > Hi, > > I ran SQL > SELECT TIMEOFDAY() FROM (VALUES(1)); > and I got: > 2018-10-03 15:27:16.586 Asia/Singapore > > I added -Duser.timezone=UTC in DRILL_JAVA_OPTS in the sqlline.bat > Restarted drill > > I ran SQL again > SELECT TIMEOFDAY() FROM (VALUES(1)); > and I got: > 2018-10-03 15:27:16.586 UTC > > I ran SQL that failed before: > select to_date('1982/01/01 00:00:00.0','/MM/dd > HH:mm:ss.S') -- fail > from (values(1)) > and it passes! > > Thank you for the help. > > Herman > > > On Wed, Oct 3, 2018 at 5:35 PM Vova Vysotskyi wrote: > >> Hello Herman, >> >> I tried to reproduce this error, but all queries passed on my machine. >> Could you please add more details about your env? Which version of Drill >> is >> used, which timezone is set? >> Is it reproduced with UTC timezone? >> >> Kind regards, >> Volodymyr Vysotskyi >> >> >> On Mon, Oct 1, 2018 at 10:58 AM Herman Tan wrote: >> >> > Hi, >> > >> > I have a very puzzling error. >> > Try the following SQL statements. >> > >> > What is the problem with '1982/01/01 00:01:00.0'? >> > Error message: Illegal instant due to time zone offset transition >> > >> > select to_date('1981/12/31 00:00:00.0','/MM/dd >> > HH:mm:ss.S') -- pass >> > from (values(1)) >> > >> > select to_date('1981/12/31 11:59:59.0','/MM/dd >> > HH:mm:ss.S') -- pass >> > from (values(1)) >> > >> > select to_date('1982/01/01 00:00:00.0','/MM/dd >> > HH:mm:ss.S') -- fail >> > from (values(1)) >> > >> > select to_date('1982/01/01 00:00:01.0','/MM/dd >> > HH:mm:ss.S') -- fail >> > from (values(1)) >> > >> > select to_date('1982/01/01 00:01:00.0','/MM/dd >> > HH:mm:ss.S') -- fail >> > from (values(1)) >> > >> > select to_date('1982/01/01 01:00:00.0','/MM/dd >> > HH:mm:ss.S') -- pass >> > from (values(1)) >> > >> > select to_date('1982/01/02 00:00:00.0','/MM/dd >> > HH:mm:ss.S') -- pass >> > from (values(1)) >> > >> > select to_date('1983/01/01 00:00:00.0','/MM/dd >> > HH:mm:ss.S') -- pass >> > from (values(1)) >> > >> > Herman >> > >> >
Re: to_date() string to date conversion ERROR
Hi, I ran SQL SELECT TIMEOFDAY() FROM (VALUES(1)); and I got: 2018-10-03 15:27:16.586 Asia/Singapore I added -Duser.timezone=UTC in DRILL_JAVA_OPTS in the sqlline.bat Restarted drill I ran SQL again SELECT TIMEOFDAY() FROM (VALUES(1)); and I got: 2018-10-03 15:27:16.586 UTC I ran SQL that failed before: select to_date('1982/01/01 00:00:00.0','/MM/dd HH:mm:ss.S') -- fail from (values(1)) and it passes! Thank you for the help. Herman On Wed, Oct 3, 2018 at 5:35 PM Vova Vysotskyi wrote: > Hello Herman, > > I tried to reproduce this error, but all queries passed on my machine. > Could you please add more details about your env? Which version of Drill is > used, which timezone is set? > Is it reproduced with UTC timezone? > > Kind regards, > Volodymyr Vysotskyi > > > On Mon, Oct 1, 2018 at 10:58 AM Herman Tan wrote: > > > Hi, > > > > I have a very puzzling error. > > Try the following SQL statements. > > > > What is the problem with '1982/01/01 00:01:00.0'? > > Error message: Illegal instant due to time zone offset transition > > > > select to_date('1981/12/31 00:00:00.0','/MM/dd > > HH:mm:ss.S') -- pass > > from (values(1)) > > > > select to_date('1981/12/31 11:59:59.0','/MM/dd > > HH:mm:ss.S') -- pass > > from (values(1)) > > > > select to_date('1982/01/01 00:00:00.0','/MM/dd > > HH:mm:ss.S') -- fail > > from (values(1)) > > > > select to_date('1982/01/01 00:00:01.0','/MM/dd > > HH:mm:ss.S') -- fail > > from (values(1)) > > > > select to_date('1982/01/01 00:01:00.0','/MM/dd > > HH:mm:ss.S') -- fail > > from (values(1)) > > > > select to_date('1982/01/01 01:00:00.0','/MM/dd > > HH:mm:ss.S') -- pass > > from (values(1)) > > > > select to_date('1982/01/02 00:00:00.0','/MM/dd > > HH:mm:ss.S') -- pass > > from (values(1)) > > > > select to_date('1983/01/01 00:00:00.0','/MM/dd > > HH:mm:ss.S') -- pass > > from (values(1)) > > > > Herman > > >
Re: ERROR is reading parquet data after create table
Hi, I found the hard coded parameter in the source code https://github.com/apache/drill/blob/8edeb49873d1a1710cfe28e0b49364d07eb1aef4/exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java LINE 52 : private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000; LINE 210 :timeout = TIMEOUT_PER_RUNNABLE_IN_MSECS * ((tasks.size() - 1)/parallelism + 1); parallelism param is also hardcoded https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java LINE 343: 16 In my case: task size is 29, 15000 * ((29-1)/(16+1)) = 15000 * 2 = 3 16 runnables executes round robbin on 29 tasks. Each runnable given 3 ms to timeout. This is the error message: Waited for 3 ms, but only 11 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. TimedCallable.java: LINE 248:final String errMsg = String.format("Waited for %d ms, but only %d tasks for '%s' are complete." + " Total number of tasks %d, parallelism %d.", timeout, futureMapper.count, activity, tasks.size(), parallelism); Shouldn't these be parameterized in "options" based on the infrastructure? Regards, Herman On Tue, Oct 2, 2018 at 6:47 PM Herman Tan wrote: > Hi, > > I have restarted drill and run the script again. > > select * from dfs.tmp.`load_pos_sales_detail_tbl`; > -- SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 11 tasks for > 'Fetch parquet metadata' are complete. Total number of tasks 29, > parallelism 16. > > The 29 tasks is related to the 29 parquet files in the folder. > To check if any of the parquet files has an error, I ran the following SQL > on each parquet file in the folder. ALL PASSED. (SQL Below). > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; > > So it seems that for this table drill can only get the metadata for 11 > parquet files before it times out. > The time-out is a calculation and it varies from size of table. > I checked the source code but I cannot find the calculation of the timeout > of "3 ms". > When I am lucky, drill can resolve the metadata for 29 files in 3 ms > and it passes. > > I plan to use drill for production but this bothers me that there is a > limit on the number of parquet files and the timeout parameter cannot be > tuned. > > Does anyone have any ideas? > > Regards, > Herman > -- SQL BELOW - > > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_14_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_1_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_2_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_3_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_8_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_8_1.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_9_0.parquet`; > select * from dfs.tmp.`load_pos_sales_detail_tbl/1_9_1.parquet`; > > > On Tue, Oct 2, 2018 at 4:44 PM Herman Tan wrote: > >> Hi Divya and everyone, >> >> The problem has disappeared. >> Drill was not restarted. >> This appears to be intermittent. >> Before I submitted the error report, I ran the script several times and >> it failed all the time. >> Today I ran it agai
Re: ERROR is reading parquet data after create table
Hi, I have restarted drill and run the script again. select * from dfs.tmp.`load_pos_sales_detail_tbl`; -- SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 11 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. The 29 tasks is related to the 29 parquet files in the folder. To check if any of the parquet files has an error, I ran the following SQL on each parquet file in the folder. ALL PASSED. (SQL Below). select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; So it seems that for this table drill can only get the metadata for 11 parquet files before it times out. The time-out is a calculation and it varies from size of table. I checked the source code but I cannot find the calculation of the timeout of "3 ms". When I am lucky, drill can resolve the metadata for 29 files in 3 ms and it passes. I plan to use drill for production but this bothers me that there is a limit on the number of parquet files and the timeout parameter cannot be tuned. Does anyone have any ideas? Regards, Herman -- SQL BELOW - select * from dfs.tmp.`load_pos_sales_detail_tbl/1_0_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_10_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_11_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_12_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_13_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_14_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_15_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_16_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_1_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_2_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_3_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_4_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_5_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_6_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_7_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_8_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_8_1.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_9_0.parquet`; select * from dfs.tmp.`load_pos_sales_detail_tbl/1_9_1.parquet`; On Tue, Oct 2, 2018 at 4:44 PM Herman Tan wrote: > Hi Divya and everyone, > > The problem has disappeared. > Drill was not restarted. > This appears to be intermittent. > Before I submitted the error report, I ran the script several times and it > failed all the time. > Today I ran it again and it succeeded. > I will restart and test again. > > Regards, > Herman > > > > On Thu, Sep 27, 2018 at 11:50 AM Divya Gehlot > wrote: > >> Hi Herman, >> Just to ensure that your parquet file format is not corrupted , Can you >> please query a folder like just 2001 or some of the files underneath >> .Instead of querying the whole data set at once . >> >> Thanks, >> Divya >> >> On Wed, 26 Sep 2018 at 15:35, Herman Tan wrote: >> >> > Hi Kunal, >> > >> > >> > That said, could you provide some details about the parquet data you've >> > created, like the schema, parquet version and the tool used to generate. >> > Usually, the schema (and meta) provides most of these details for any >> > parquet file. >> > >> > >> > 1. The schema is under dfs.tmp, the queries to generate are all >> documented >> > below. >> > 2. I don't know how to find the parquet version of the data file >> > 3. The tool used to generate the parquest is apache drill. The CTAS is >> > detailed below. >> > >> > Regards, >> > Herman >> > >> > >> > *This is the Text data* >> > >> > This is the folders of the files >> > Total # of lines about 50 million rows >> > -- >> > show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_ >> > 20180825` >> > ; >> > show files
Re: ERROR is reading parquet data after create table
Hi Divya and everyone, The problem has disappeared. Drill was not restarted. This appears to be intermittent. Before I submitted the error report, I ran the script several times and it failed all the time. Today I ran it again and it succeeded. I will restart and test again. Regards, Herman On Thu, Sep 27, 2018 at 11:50 AM Divya Gehlot wrote: > Hi Herman, > Just to ensure that your parquet file format is not corrupted , Can you > please query a folder like just 2001 or some of the files underneath > .Instead of querying the whole data set at once . > > Thanks, > Divya > > On Wed, 26 Sep 2018 at 15:35, Herman Tan wrote: > > > Hi Kunal, > > > > > > That said, could you provide some details about the parquet data you've > > created, like the schema, parquet version and the tool used to generate. > > Usually, the schema (and meta) provides most of these details for any > > parquet file. > > > > > > 1. The schema is under dfs.tmp, the queries to generate are all > documented > > below. > > 2. I don't know how to find the parquet version of the data file > > 3. The tool used to generate the parquest is apache drill. The CTAS is > > detailed below. > > > > Regards, > > Herman > > > > > > *This is the Text data* > > > > This is the folders of the files > > Total # of lines about 50 million rows > > -- > > show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_ > > 20180825` > > ; > > show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_ > > 20180825\2011` > > ; > > - > > sales_pos_detail > > \pos_details_20180825 > > \2007 > > \2008 > > \2009 > > \2010 > > \2011 > > \pos_details_0.csv > > \pos_details_1.csv > > \pos_details_2.csv > > \pos_details_3.csv > > \pos_details_4.csv > > \pos_details_5.csv > > \pos_details_6.csv > > \pos_details_7.csv > > \pos_details_8.csv > > \2012 > > \2013 > > \2014 > > \2015 > > \2016 > > \2017 > > \2018 > > \others > > - > > > > *This is the view with the metadata defined:* > > > > create or replace view dfs.tmp.load_pos_sales_detail as > > SELECT > > -- dimension keys > > cast(dim_date_key as int) dim_date_key > > ,cast(dim_site_key as int) dim_site_key > > ,cast(dim_pos_header_key as bigint) dim_pos_header_key > > ,cast(dim_pos_cashier_key as int) dim_pos_cashier_key > > ,cast(dim_card_number_key as int) dim_card_number_key > > ,cast(dim_hour_minute_key as int) dim_hour_minute_key > > ,cast(dim_pos_clerk_key as int) dim_pos_clerk_key > > ,cast(dim_product_key as int) dim_product_key > > ,cast(dim_pos_employee_purchase_key as int) dim_pos_employee_purchase_key > > ,cast(dim_pos_terminal_key as int) dim_pos_terminal_key > > ,cast(dim_campaign_key as int) dim_campaign_key > > ,cast(dim_promo_key as int) dim_promo_key > > ,cast( case when dim_site_lfl_key = '' then 0 else dim_site_lfl_key end > as > > int) dim_site_lfl_key > > -- derived from keys > > ,dim_date_str > > ,`year` as `trx_year` > > -- Measures > > ,Product_Sales_Qty > > ,Product_Sales_Price > > ,Product_Cost_Price > > ,Product_Cost_Amt > > ,Product_Sales_Gross_Amt > > ,Product_Sales_Promo_Disc_Amt > > ,Product_Sales_Add_Promo_Disc_Amt > > ,Product_Sales_Total_Promo_Disc_Amt > > ,Product_Sales_Retail_Promo_Amt > > ,Product_Sales_Retail_Amt > > ,Product_Sales_VAT_Amt > > ,Product_Sales_Product_Margin_Amt > > ,Product_Sales_Initial_Margin_Amt > > from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825` > > ; > > > > > > *This is the CTAS that generates the parquet from the view above:* > > > > drop table if exists dfs.tmp.load_pos_sales_detail_tbl > > ; > > > > create table dfs.tmp.load_pos_sales_detail_tbl AS > > SELECT > > -- dimension keys > > dim_date_key > > ,dim_site_key > > ,dim_pos_header_key > > ,dim_pos_cashier_key > > ,dim_card_number_key > > ,dim_hour_minute_key > > ,dim_pos_clerk_key > > ,dim_product_key > > ,dim_pos_employee_purchase_key > > ,dim_pos_terminal_key > > ,dim_campaign_key > > ,dim_promo_key > > ,dim_site_lfl_key > > -- derived from keys > > ,dim_date_str > > ,`trx_year` > > -- Measures > > ,Product_Sales_Qty Sales_Qty > > ,Product_Sales_Price Sales_Price >
to_date() string to date conversion ERROR
Hi, I have a very puzzling error. Try the following SQL statements. What is the problem with '1982/01/01 00:01:00.0'? Error message: Illegal instant due to time zone offset transition select to_date('1981/12/31 00:00:00.0','/MM/dd HH:mm:ss.S') -- pass from (values(1)) select to_date('1981/12/31 11:59:59.0','/MM/dd HH:mm:ss.S') -- pass from (values(1)) select to_date('1982/01/01 00:00:00.0','/MM/dd HH:mm:ss.S') -- fail from (values(1)) select to_date('1982/01/01 00:00:01.0','/MM/dd HH:mm:ss.S') -- fail from (values(1)) select to_date('1982/01/01 00:01:00.0','/MM/dd HH:mm:ss.S') -- fail from (values(1)) select to_date('1982/01/01 01:00:00.0','/MM/dd HH:mm:ss.S') -- pass from (values(1)) select to_date('1982/01/02 00:00:00.0','/MM/dd HH:mm:ss.S') -- pass from (values(1)) select to_date('1983/01/01 00:00:00.0','/MM/dd HH:mm:ss.S') -- pass from (values(1)) Herman
Re: ERROR is reading parquet data after create table
Hi Kunal, That said, could you provide some details about the parquet data you've created, like the schema, parquet version and the tool used to generate. Usually, the schema (and meta) provides most of these details for any parquet file. 1. The schema is under dfs.tmp, the queries to generate are all documented below. 2. I don't know how to find the parquet version of the data file 3. The tool used to generate the parquest is apache drill. The CTAS is detailed below. Regards, Herman *This is the Text data* This is the folders of the files Total # of lines about 50 million rows -- show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_ 20180825` ; show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_ 20180825\2011` ; - sales_pos_detail \pos_details_20180825 \2007 \2008 \2009 \2010 \2011 \pos_details_0.csv \pos_details_1.csv \pos_details_2.csv \pos_details_3.csv \pos_details_4.csv \pos_details_5.csv \pos_details_6.csv \pos_details_7.csv \pos_details_8.csv \2012 \2013 \2014 \2015 \2016 \2017 \2018 \others - *This is the view with the metadata defined:* create or replace view dfs.tmp.load_pos_sales_detail as SELECT -- dimension keys cast(dim_date_key as int) dim_date_key ,cast(dim_site_key as int) dim_site_key ,cast(dim_pos_header_key as bigint) dim_pos_header_key ,cast(dim_pos_cashier_key as int) dim_pos_cashier_key ,cast(dim_card_number_key as int) dim_card_number_key ,cast(dim_hour_minute_key as int) dim_hour_minute_key ,cast(dim_pos_clerk_key as int) dim_pos_clerk_key ,cast(dim_product_key as int) dim_product_key ,cast(dim_pos_employee_purchase_key as int) dim_pos_employee_purchase_key ,cast(dim_pos_terminal_key as int) dim_pos_terminal_key ,cast(dim_campaign_key as int) dim_campaign_key ,cast(dim_promo_key as int) dim_promo_key ,cast( case when dim_site_lfl_key = '' then 0 else dim_site_lfl_key end as int) dim_site_lfl_key -- derived from keys ,dim_date_str ,`year` as `trx_year` -- Measures ,Product_Sales_Qty ,Product_Sales_Price ,Product_Cost_Price ,Product_Cost_Amt ,Product_Sales_Gross_Amt ,Product_Sales_Promo_Disc_Amt ,Product_Sales_Add_Promo_Disc_Amt ,Product_Sales_Total_Promo_Disc_Amt ,Product_Sales_Retail_Promo_Amt ,Product_Sales_Retail_Amt ,Product_Sales_VAT_Amt ,Product_Sales_Product_Margin_Amt ,Product_Sales_Initial_Margin_Amt from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825` ; *This is the CTAS that generates the parquet from the view above:* drop table if exists dfs.tmp.load_pos_sales_detail_tbl ; create table dfs.tmp.load_pos_sales_detail_tbl AS SELECT -- dimension keys dim_date_key ,dim_site_key ,dim_pos_header_key ,dim_pos_cashier_key ,dim_card_number_key ,dim_hour_minute_key ,dim_pos_clerk_key ,dim_product_key ,dim_pos_employee_purchase_key ,dim_pos_terminal_key ,dim_campaign_key ,dim_promo_key ,dim_site_lfl_key -- derived from keys ,dim_date_str ,`trx_year` -- Measures ,Product_Sales_Qty Sales_Qty ,Product_Sales_Price Sales_Price ,Product_Cost_Price Cost_Price ,Product_Cost_Amt Cost_Amt ,Product_Sales_Gross_Amt Sales_Gross_Amt ,Product_Sales_Promo_Disc_Amt Sales_Promo_Disc_Amt ,Product_Sales_Add_Promo_Disc_Amt Add_Promo_Disc_Amt ,Product_Sales_Total_Promo_Disc_Amt Total_Promo_Disc_Amt ,Product_Sales_Retail_Promo_Amt Retail_Promo_Amt ,Product_Sales_Retail_Amt Retail_Amt ,Product_Sales_VAT_Amt VAT_Amt ,Product_Sales_Product_Margin_Amt Product_Margin_Amt ,Product_Sales_Initial_Margin_Amt Initial_Margin_Amt from dfs.tmp.load_pos_sales_detail ; *This is the select query that generated the error:* select * from dfs.tmp.load_pos_sales_detail_tbl ; - ERROR SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. On Mon, Sep 24, 2018 at 9:08 AM, Kunal Khatua wrote: > Hi Herman > > Assuming that you're doing analytics on your data. If that's the case, > parquet format is the way to go. > > That said, could you provide some details about the parquet data you've > created, like the schema, parquet version and the tool used to generate. > Usually, the schema (and meta) provides most of these details for any > parquet file. > > It'll be useful to know if there is a pattern in the failure because of > which there might be corruption occurring. > > Kunal > > > On 9/22/2018 11:49:36 PM, Herman Tan wrote: > Hi Karthik, > > Thank you for pointing me to the mail archive in May 2018. > That is exactly the same problem I am facing. > > I thought of using Drill as an ETL where I load the warehouse parquet > tables from text source files. > Then I query the parquet tables. > It works on some parquet tables but am having problems with large ones that > consist of several files. (I think) > Still investigating.
Re: ERROR is reading parquet data after create table
Hi Karthik, Thank you for pointing me to the mail archive in May 2018. That is exactly the same problem I am facing. I thought of using Drill as an ETL where I load the warehouse parquet tables from text source files. Then I query the parquet tables. It works on some parquet tables but am having problems with large ones that consist of several files. (I think) Still investigating. Anyone in the community have other experience? Should I work with all text files instead of parquet? Herman On Fri, Sep 21, 2018 at 2:15 AM, Karthikeyan Manivannan < kmanivan...@mapr.com> wrote: > Hi Herman, > > I am not sure what the exact problem here is but can you check to see if > you are not hitting the problem described here: > > http://mail-archives.apache.org/mod_mbox/drill-user/201805.mbox/% > 3ccacwrgnexlxop2vcyugsa4gwd1jgs8f+rcpzq8rhuatfw5fm...@mail.gmail.com%3E > > Thanks > > Karthik > > On Wed, Sep 19, 2018 at 7:02 PM Herman Tan wrote: > > > Hi, > > > > I encountered the following error. > > The Steps I did are as follows: > > 1. Create a view to fix the data type of fields with cast > > 2. Create table (parquet) using the view > > 3. Query select * from table (query a field also does not work) > > > > The error: > > SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for > > 'Fetch parquet metadata' are complete. Total number of tasks 29, > > parallelism 16. > > > > When I re-run this, the number of tasks will vary. > > > > What could be the problem? > > > > Regards, > > Herman Tan > > > > More info below: > > > > This is the folders of the files > > Total # of lines, 50 million > > -- > > show files from > > dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825` > > ; > > show files from > > dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825\2011` > > ; > > - > > sales_pos_detail > > \pos_details_20180825 > > \2007 > > \2008 > > \2009 > > \2010 > > \2011 > > \pos_details_0.csv > > \pos_details_1.csv > > \pos_details_2.csv > > \pos_details_3.csv > > \pos_details_4.csv > > \pos_details_5.csv > > \pos_details_6.csv > > \pos_details_7.csv > > \pos_details_8.csv > > \2012 > > \2013 > > \2014 > > \2015 > > \2016 > > \2017 > > \2018 > > \others > > - > > > > create or replace view dfs.tmp.load_pos_sales_detail as > > SELECT > > -- dimension keys > > cast(dim_date_key as int) dim_date_key > > ,cast(dim_site_key as int) dim_site_key > > ,cast(dim_pos_header_key as bigint) dim_pos_header_key > > ,cast(dim_pos_cashier_key as int) dim_pos_cashier_key > > ,cast(dim_card_number_key as int) dim_card_number_key > > ,cast(dim_hour_minute_key as int) dim_hour_minute_key > > ,cast(dim_pos_clerk_key as int) dim_pos_clerk_key > > ,cast(dim_product_key as int) dim_product_key > > ,cast(dim_pos_employee_purchase_key as int) > dim_pos_employee_purchase_key > > ,cast(dim_pos_terminal_key as int) dim_pos_terminal_key > > ,cast(dim_campaign_key as int) dim_campaign_key > > ,cast(dim_promo_key as int) dim_promo_key > > ,cast( case when dim_site_lfl_key = '' then 0 else dim_site_lfl_key end > as > > int) dim_site_lfl_key > > -- derived from keys > > ,dim_date_str > > ,`year` as `trx_year` > > -- Measures > > ,Product_Sales_Qty > > ,Product_Sales_Price > > ,Product_Cost_Price > > ,Product_Cost_Amt > > ,Product_Sales_Gross_Amt > > ,Product_Sales_Promo_Disc_Amt > > ,Product_Sales_Add_Promo_Disc_Amt > > ,Product_Sales_Total_Promo_Disc_Amt > > ,Product_Sales_Retail_Promo_Amt > > ,Product_Sales_Retail_Amt > > ,Product_Sales_VAT_Amt > > ,Product_Sales_Product_Margin_Amt > > ,Product_Sales_Initial_Margin_Amt > > from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825` > > ; > > > > drop table if exists dfs.tmp.load_pos_sales_detail_tbl > > ; > > > > create table dfs.tmp.load_pos_sales_detail_tbl AS > > SELECT > > -- dimension keys > > dim_date_key > > ,dim_site_key > > ,dim_pos_header_key > > ,dim_pos_cashier_key > > ,dim_card_number_key > > ,dim_hour_minute_key > > ,dim_pos_clerk_key > > ,dim_product_key > > ,dim_pos_employee_purchase_key > > ,dim_pos_terminal_key > > ,dim_campaign_key > > ,dim_promo_key > > ,dim_site_lfl_key > > -- derived from keys > &g
ERROR is reading parquet data after create table
Hi, I encountered the following error. The Steps I did are as follows: 1. Create a view to fix the data type of fields with cast 2. Create table (parquet) using the view 3. Query select * from table (query a field also does not work) The error: SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. When I re-run this, the number of tasks will vary. What could be the problem? Regards, Herman Tan More info below: This is the folders of the files Total # of lines, 50 million -- show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825` ; show files from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825\2011` ; - sales_pos_detail \pos_details_20180825 \2007 \2008 \2009 \2010 \2011 \pos_details_0.csv \pos_details_1.csv \pos_details_2.csv \pos_details_3.csv \pos_details_4.csv \pos_details_5.csv \pos_details_6.csv \pos_details_7.csv \pos_details_8.csv \2012 \2013 \2014 \2015 \2016 \2017 \2018 \others - create or replace view dfs.tmp.load_pos_sales_detail as SELECT -- dimension keys cast(dim_date_key as int) dim_date_key ,cast(dim_site_key as int) dim_site_key ,cast(dim_pos_header_key as bigint) dim_pos_header_key ,cast(dim_pos_cashier_key as int) dim_pos_cashier_key ,cast(dim_card_number_key as int) dim_card_number_key ,cast(dim_hour_minute_key as int) dim_hour_minute_key ,cast(dim_pos_clerk_key as int) dim_pos_clerk_key ,cast(dim_product_key as int) dim_product_key ,cast(dim_pos_employee_purchase_key as int) dim_pos_employee_purchase_key ,cast(dim_pos_terminal_key as int) dim_pos_terminal_key ,cast(dim_campaign_key as int) dim_campaign_key ,cast(dim_promo_key as int) dim_promo_key ,cast( case when dim_site_lfl_key = '' then 0 else dim_site_lfl_key end as int) dim_site_lfl_key -- derived from keys ,dim_date_str ,`year` as `trx_year` -- Measures ,Product_Sales_Qty ,Product_Sales_Price ,Product_Cost_Price ,Product_Cost_Amt ,Product_Sales_Gross_Amt ,Product_Sales_Promo_Disc_Amt ,Product_Sales_Add_Promo_Disc_Amt ,Product_Sales_Total_Promo_Disc_Amt ,Product_Sales_Retail_Promo_Amt ,Product_Sales_Retail_Amt ,Product_Sales_VAT_Amt ,Product_Sales_Product_Margin_Amt ,Product_Sales_Initial_Margin_Amt from dfs.`D:\retail_sandbox\pos\sales_pos_detail\pos_details_20180825` ; drop table if exists dfs.tmp.load_pos_sales_detail_tbl ; create table dfs.tmp.load_pos_sales_detail_tbl AS SELECT -- dimension keys dim_date_key ,dim_site_key ,dim_pos_header_key ,dim_pos_cashier_key ,dim_card_number_key ,dim_hour_minute_key ,dim_pos_clerk_key ,dim_product_key ,dim_pos_employee_purchase_key ,dim_pos_terminal_key ,dim_campaign_key ,dim_promo_key ,dim_site_lfl_key -- derived from keys ,dim_date_str ,`trx_year` -- Measures ,Product_Sales_Qty Sales_Qty ,Product_Sales_Price Sales_Price ,Product_Cost_Price Cost_Price ,Product_Cost_Amt Cost_Amt ,Product_Sales_Gross_Amt Sales_Gross_Amt ,Product_Sales_Promo_Disc_Amt Sales_Promo_Disc_Amt ,Product_Sales_Add_Promo_Disc_Amt Add_Promo_Disc_Amt ,Product_Sales_Total_Promo_Disc_Amt Total_Promo_Disc_Amt ,Product_Sales_Retail_Promo_Amt Retail_Promo_Amt ,Product_Sales_Retail_Amt Retail_Amt ,Product_Sales_VAT_Amt VAT_Amt ,Product_Sales_Product_Margin_Amt Product_Margin_Amt ,Product_Sales_Initial_Margin_Amt Initial_Margin_Amt from dfs.tmp.load_pos_sales_detail ; select * from dfs.tmp.load_pos_sales_detail_tbl ; - ERROR SQL Error: RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. [Error Id: 3b079174-f5d0-4313-8097-25a0b3070854 on IORA-G9KY9P2.stf.nus.edu.sg:31010] RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. [Error Id: 3b079174-f5d0-4313-8097-25a0b3070854 on IORA-G9KY9P2.stf.nus.edu.sg:31010] RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. [Error Id: 3b079174-f5d0-4313-8097-25a0b3070854 on IORA-G9KY9P2.stf.nus.edu.sg:31010] RESOURCE ERROR: Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallelism 16. [Error Id: 3b079174-f5d0-4313-8097-25a0b3070854 on IORA-G9KY9P2.stf.nus.edu.sg:31010] >From Drill log: 2018-09-20 08:58:12,035 [245d0f5a-ae5f-bfa2-ff04-40f7bdd1c2bf:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 245d0f5a-ae5f-bfa2-ff04-40f7bdd1c2bf: select * from dfs.tmp.load_pos_sales_detail_tbl 2018-09-20 08:58:53,068 [245d0f5a-ae5f-bfa2-ff04-40f7bdd1c2bf:foreman] ERROR o.a.d.e.s.parquet.metadata.Metadata - Waited for 3 ms, but only 10 tasks for 'Fetch parquet metadata' are complete. Total number of tasks 29, parallel