Re: drill tests not passing
Hi Mike, A quick glance at the log suggests a failure in the tests for the JSON reader, in the Mongo extended types. Drill's date/time support has historically been fragile. Some tests only work if your machine is set to use the UTC time zone (or Java is told to pretend that the time is UTC.) The Mongo types test failure seems to be around a date/time test so maybe this is the issue? There are also failures indicating that the Drillbit (Drill server) died. Not sure how this can happen, as tests run Drill embedded (or used to.) Looking earlier in the logs, it seems that the Drillbit didn't start due to UDF (user-defined function) failures: Found duplicated function in drill-custom-lower.jar: custom_lower(VARCHAR-REQUIRED) Found duplicated function in built-in: lower(VARCHAR-REQUIRED) Not sure how this could occur: it should have failed in all builds. Also: File /opt/drill/exec/java-exec/target/org.apache.drill.exec.udf.dynamic.TestDynamicUDFSupport/home/drill/happy/udf/staging/drill-custom-lower-sources.jar does not exist on file system file:/// This is complaining that Drill needs the source code (not just class file) for its built-in functions. Again, this should not fail in a standard build, because if it did, it would fail in all builds. There are other odd errors as well. Perhaps we should ask: is this a "stock" build? Check out Drill and run tests? Or, have you already started making changes for your project? - Paul On Tue, Jul 11, 2023 at 9:07 AM Mike Beckerle wrote: > > I have drill building and running its tests. Some tests fail: [ERROR] > Tests run: 4366, Failures: 2, Errors: 1, Skipped: 133 > > I am wondering if there is perhaps some setup step that I missed in the > instructions. > > I have attached the output from the 'mvn clean install -DskipTests=false' > execution. (zipped) > I am running on Ubuntu 20.04, definitely have Java 8 setup. > > I'm hoping someone can skim it and spot the issue(s). > > Thanks for any help > > Mike Beckerle > Apache Daffodil PMC | daffodil.apache.org > OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > Owl Cyber Defense | www.owlcyberdefense.com > > >
Re: Drill and Highly Hierarchical Data from Daffodil
Drill can internally handle scalars, arrays (AKA vectors) and maps (AKA tuples, structs). SQL, however, prefers to work with scalars: there is no good syntax to reach inside a complex object for, say, a WHERE condition without also projecting that item as a top-level scalar. The cool thing, for ML use cases, is that Drill's arrays can also be structured: a vector of input values each of which is a vector of data points along with a class label. That said, if you have a record with a field "obj" that is a map (struct, object) that contains a field "coord" that is an array of two (or three) doubles, you can project it as: SELECT obj.coord FROM something The value you get back will be an array. Drill's native API handles this just fine. JDBC does not really speak "vector". So, in that case, you could project the elements: SELECT obj.coord[0] AS x, obj.coord[1] AS y FROM something I find it helpful to first think about how Drill's internal data vectors will look, then work from there to the SQL that will do what needs doing. - Paul On Tue, Jul 11, 2023 at 11:46 AM Charles Givre wrote: > HI Mike, > When you say "you want all of them', can you clarify a bit about what > you'd want the data to look like? > Best, > -- C > > > > > On Jul 11, 2023, at 12:33 PM, Mike Beckerle > wrote: > > > > In designing the integration of Apache Daffodil into Drill, I'm trying to > > figure out how queries would look operating on deeply nested data. > > > > Here's an example. > > > > This is the path to many geo-location latLong field pairs in some > > "messageSet" data: > > > > > messageSet/noc_message[*]/message_content/content/vmf/payload/message/K05_17/overlay_message/r1_group/item[*]/points_group/item[*]/latLong > > > > This is sort-of like XPath, except in the above I have put "[*]" to > > indicate the child elements that are vectors. You can see there are 3 > > nested vectors here. > > > > Beneath that path are these two fields, which are what I would want out > of > > my query, along with some fields from higher up in the nest. > > > > entity_latitude_1/degrees > > entity_longitude_1/degrees > > > > The tutorial information here > > > >https://drill.apache.org/docs/selecting-nested-data-for-a-column/ > > > > describes how to index into JSON arrays with specific integer values, > but I > > don't want specific integers, I want all values of them. > > > > Can someone show me what a hypothetical Drill query would look like that > > pulls out all the values of this latLong pair? > > > > My stab is: > > > > SELECT pairs.entity_latitude_1.degrees AS lat, > > pairs.entity_longitude_1.degrees AS lon FROM > > > messageSet.noc_message[*].message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item[*].points_group.item[*].latLong > > AS pairs > > > > I'm not at all sure about the vectors in that though. > > > > The other idea was this quasi-notation (that I'm making up on the fly > here) > > which treats each vector as a table. > > > > SELECT pairs.entity_latitude_1.degrees AS lat, > > pairs.entity_longitude_1.degrees AS lon FROM > > messageSet.noc_message AS messages, > > > > > messages.message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item > > AS parents > > parents.points_group.item AS items > > items.latLong AS pairs > > > > I have no idea if that makes any sense at all for Drill > > > > Any help greatly appreciated. > > > > -Mike Beckerle > >
Re: Drill and Highly Hierarchical Data from Daffodil
HI Mike, When you say "you want all of them', can you clarify a bit about what you'd want the data to look like? Best, -- C > On Jul 11, 2023, at 12:33 PM, Mike Beckerle wrote: > > In designing the integration of Apache Daffodil into Drill, I'm trying to > figure out how queries would look operating on deeply nested data. > > Here's an example. > > This is the path to many geo-location latLong field pairs in some > "messageSet" data: > > messageSet/noc_message[*]/message_content/content/vmf/payload/message/K05_17/overlay_message/r1_group/item[*]/points_group/item[*]/latLong > > This is sort-of like XPath, except in the above I have put "[*]" to > indicate the child elements that are vectors. You can see there are 3 > nested vectors here. > > Beneath that path are these two fields, which are what I would want out of > my query, along with some fields from higher up in the nest. > > entity_latitude_1/degrees > entity_longitude_1/degrees > > The tutorial information here > >https://drill.apache.org/docs/selecting-nested-data-for-a-column/ > > describes how to index into JSON arrays with specific integer values, but I > don't want specific integers, I want all values of them. > > Can someone show me what a hypothetical Drill query would look like that > pulls out all the values of this latLong pair? > > My stab is: > > SELECT pairs.entity_latitude_1.degrees AS lat, > pairs.entity_longitude_1.degrees AS lon FROM > messageSet.noc_message[*].message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item[*].points_group.item[*].latLong > AS pairs > > I'm not at all sure about the vectors in that though. > > The other idea was this quasi-notation (that I'm making up on the fly here) > which treats each vector as a table. > > SELECT pairs.entity_latitude_1.degrees AS lat, > pairs.entity_longitude_1.degrees AS lon FROM > messageSet.noc_message AS messages, > > messages.message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item > AS parents > parents.points_group.item AS items > items.latLong AS pairs > > I have no idea if that makes any sense at all for Drill > > Any help greatly appreciated. > > -Mike Beckerle
Drill and Highly Hierarchical Data from Daffodil
In designing the integration of Apache Daffodil into Drill, I'm trying to figure out how queries would look operating on deeply nested data. Here's an example. This is the path to many geo-location latLong field pairs in some "messageSet" data: messageSet/noc_message[*]/message_content/content/vmf/payload/message/K05_17/overlay_message/r1_group/item[*]/points_group/item[*]/latLong This is sort-of like XPath, except in the above I have put "[*]" to indicate the child elements that are vectors. You can see there are 3 nested vectors here. Beneath that path are these two fields, which are what I would want out of my query, along with some fields from higher up in the nest. entity_latitude_1/degrees entity_longitude_1/degrees The tutorial information here https://drill.apache.org/docs/selecting-nested-data-for-a-column/ describes how to index into JSON arrays with specific integer values, but I don't want specific integers, I want all values of them. Can someone show me what a hypothetical Drill query would look like that pulls out all the values of this latLong pair? My stab is: SELECT pairs.entity_latitude_1.degrees AS lat, pairs.entity_longitude_1.degrees AS lon FROM messageSet.noc_message[*].message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item[*].points_group.item[*].latLong AS pairs I'm not at all sure about the vectors in that though. The other idea was this quasi-notation (that I'm making up on the fly here) which treats each vector as a table. SELECT pairs.entity_latitude_1.degrees AS lat, pairs.entity_longitude_1.degrees AS lon FROM messageSet.noc_message AS messages, messages.message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item AS parents parents.points_group.item AS items items.latLong AS pairs I have no idea if that makes any sense at all for Drill Any help greatly appreciated. -Mike Beckerle
Re: [I] NPE on DeltaRowGroupScan (drill)
cgivre closed issue #2810: NPE on DeltaRowGroupScan URL: https://github.com/apache/drill/issues/2810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: Newby: First attempt to build drill - failure
Methinks the Hive plugin could probably use some attention. With that said, I don't know how much use it actually gets. Yes... a ticket would probably be in order. Best, -- C > On Jul 11, 2023, at 10:38 AM, Mike Beckerle wrote: > > Should there be a ticket created about this: > > /home/mbeckerle/dataiti/opensource/drill/contrib/storage-hive/hive-exec-shade/target/classes/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore$drop_partition_by_name_with_environment_context_args$drop_partition_by_name_with_environment_context_argsTupleSchemeFactory.class > > The largest part of that path is the file name part which has > "drop_partition_by_name_with_environment_context_args" appearing twice in > the class file name. This appears to be a generated name so we should be > able to shorten it. > > > On Tue, Jul 11, 2023 at 12:27 AM James Turton wrote: > >> Good news and welcome to Drill! >> >> I haven't heard of anyone runing into this problem before, and I build >> Drill under the directory /home/james/Development/apache/drill which >> isn't far off of what you tried in terms of length. I do see the >> 280-character path cited by Maven below though. Perhaps in your case the >> drill-hive-exec-shaded was downloaded from the Apache Snapshots repo, >> rather than built locally, and this issue only presents itself if the >> maven-dependency-plugin must unpack a very long file path from a >> downloaded jar. >> >> >> On 2023/07/10 18:23, Mike Beckerle wrote: >>> Never mind. The file name was > 255 long, so I have installed the drill >>> build tree in /opt and now the path is shorter than 255. >>> >>> >>> On Mon, Jul 10, 2023 at 12:00 PM Mike Beckerle >> wrote: >>> I'm trying to build the current master branch as of today 2023-07-10. It fails due to a file-name too long issue. The command I issued is just "mvn clean install -DskipTests" per the instructions. I'm running on Linux, Ubuntu 20.04. Java 8. [INFO] --- maven-dependency-plugin:3.4.0:unpack (unpack) @ drill-hive-exec-shaded --- [INFO] Configured Artifact: >> org.apache.drill.contrib.storage-hive:drill-hive-exec-shaded:1.22.0-SNAPSHOT:jar [INFO] Unpacking >> /home/mbeckerle/dataiti/opensource/drill/contrib/storage-hive/hive-exec-shade/target/drill-hive-exec-shaded-1.22.0-SNAPSHOT.jar to >> /home/mbeckerle/dataiti/opensource/drill/contrib/storage-hive/hive-exec-shade/target/classes with includes "**/**" and excludes "" [INFO] [INFO] Reactor Summary for Drill : 1.22.0-SNAPSHOT: [INFO] [INFO] Drill : SUCCESS [ 3.974 s] [INFO] Drill : Tools : SUCCESS [ 0.226 s] [INFO] Drill : Tools : Freemarker codegen . SUCCESS [ 3.762 s] [INFO] Drill : Protocol ... SUCCESS [ 5.001 s] [INFO] Drill : Common . SUCCESS [ 4.944 s] [INFO] Drill : Logical Plan ... SUCCESS [ 5.991 s] [INFO] Drill : Exec : . SUCCESS [ 0.210 s] [INFO] Drill : Exec : Memory : SUCCESS [ 0.179 s] [INFO] Drill : Exec : Memory : Base ... SUCCESS [ 2.373 s] [INFO] Drill : Exec : RPC . SUCCESS [ 2.436 s] [INFO] Drill : Exec : Vectors . SUCCESS [ 54.917 s] [INFO] Drill : Contrib : .. SUCCESS [ 0.138 s] [INFO] Drill : Contrib : Data : ... SUCCESS [ 0.143 s] [INFO] Drill : Contrib : Data : TPCH Sample ... SUCCESS [ 1.473 s] [INFO] Drill : Metastore : SUCCESS [ 0.144 s] [INFO] Drill : Metastore : API SUCCESS [ 4.366 s] [INFO] Drill : Metastore : Iceberg SUCCESS [ 3.940 s] [INFO] Drill : Exec : Java Execution Engine ... SUCCESS >> [01:04 min] [INFO] Drill : Exec : JDBC Driver using dependencies .. SUCCESS [ 7.332 s] [INFO] Drill : Exec : JDBC JAR with all dependencies .. SUCCESS [ 16.304 s] [INFO] Drill : On-YARN SUCCESS [ 5.477 s] [INFO] Drill : Metastore : RDBMS .. SUCCESS [ 6.704 s] [INFO] Drill : Metastore : Mongo .. SUCCESS [ 3.621 s] [INFO] Drill : Contrib : Storage : Kudu ... SUCCESS [ 6.693 s] [INFO] Drill : Contrib : Format : XML . SUCCESS [ 3.511 s] [INFO] Drill : Contrib : Storage : HTTP .
Re: Newby: First attempt to build drill - failure
Should there be a ticket created about this: /home/mbeckerle/dataiti/opensource/drill/contrib/storage-hive/hive-exec-shade/target/classes/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore$drop_partition_by_name_with_environment_context_args$drop_partition_by_name_with_environment_context_argsTupleSchemeFactory.class The largest part of that path is the file name part which has "drop_partition_by_name_with_environment_context_args" appearing twice in the class file name. This appears to be a generated name so we should be able to shorten it. On Tue, Jul 11, 2023 at 12:27 AM James Turton wrote: > Good news and welcome to Drill! > > I haven't heard of anyone runing into this problem before, and I build > Drill under the directory /home/james/Development/apache/drill which > isn't far off of what you tried in terms of length. I do see the > 280-character path cited by Maven below though. Perhaps in your case the > drill-hive-exec-shaded was downloaded from the Apache Snapshots repo, > rather than built locally, and this issue only presents itself if the > maven-dependency-plugin must unpack a very long file path from a > downloaded jar. > > > On 2023/07/10 18:23, Mike Beckerle wrote: > > Never mind. The file name was > 255 long, so I have installed the drill > > build tree in /opt and now the path is shorter than 255. > > > > > > On Mon, Jul 10, 2023 at 12:00 PM Mike Beckerle > wrote: > > > >> I'm trying to build the current master branch as of today 2023-07-10. > >> > >> It fails due to a file-name too long issue. > >> > >> The command I issued is just "mvn clean install -DskipTests" per the > >> instructions. > >> > >> I'm running on Linux, Ubuntu 20.04. Java 8. > >> > >> [INFO] --- maven-dependency-plugin:3.4.0:unpack (unpack) @ > >> drill-hive-exec-shaded --- > >> [INFO] Configured Artifact: > >> > org.apache.drill.contrib.storage-hive:drill-hive-exec-shaded:1.22.0-SNAPSHOT:jar > >> [INFO] Unpacking > >> > /home/mbeckerle/dataiti/opensource/drill/contrib/storage-hive/hive-exec-shade/target/drill-hive-exec-shaded-1.22.0-SNAPSHOT.jar > >> to > >> > /home/mbeckerle/dataiti/opensource/drill/contrib/storage-hive/hive-exec-shade/target/classes > >> with includes "**/**" and excludes "" > >> [INFO] > >> > >> [INFO] Reactor Summary for Drill : 1.22.0-SNAPSHOT: > >> [INFO] > >> [INFO] Drill : SUCCESS [ > >> 3.974 s] > >> [INFO] Drill : Tools : SUCCESS [ > >> 0.226 s] > >> [INFO] Drill : Tools : Freemarker codegen . SUCCESS [ > >> 3.762 s] > >> [INFO] Drill : Protocol ... SUCCESS [ > >> 5.001 s] > >> [INFO] Drill : Common . SUCCESS [ > >> 4.944 s] > >> [INFO] Drill : Logical Plan ... SUCCESS [ > >> 5.991 s] > >> [INFO] Drill : Exec : . SUCCESS [ > >> 0.210 s] > >> [INFO] Drill : Exec : Memory : SUCCESS [ > >> 0.179 s] > >> [INFO] Drill : Exec : Memory : Base ... SUCCESS [ > >> 2.373 s] > >> [INFO] Drill : Exec : RPC . SUCCESS [ > >> 2.436 s] > >> [INFO] Drill : Exec : Vectors . SUCCESS [ > >> 54.917 s] > >> [INFO] Drill : Contrib : .. SUCCESS [ > >> 0.138 s] > >> [INFO] Drill : Contrib : Data : ... SUCCESS [ > >> 0.143 s] > >> [INFO] Drill : Contrib : Data : TPCH Sample ... SUCCESS [ > >> 1.473 s] > >> [INFO] Drill : Metastore : SUCCESS [ > >> 0.144 s] > >> [INFO] Drill : Metastore : API SUCCESS [ > >> 4.366 s] > >> [INFO] Drill : Metastore : Iceberg SUCCESS [ > >> 3.940 s] > >> [INFO] Drill : Exec : Java Execution Engine ... SUCCESS > [01:04 > >> min] > >> [INFO] Drill : Exec : JDBC Driver using dependencies .. SUCCESS [ > >> 7.332 s] > >> [INFO] Drill : Exec : JDBC JAR with all dependencies .. SUCCESS [ > >> 16.304 s] > >> [INFO] Drill : On-YARN SUCCESS [ > >> 5.477 s] > >> [INFO] Drill : Metastore : RDBMS .. SUCCESS [ > >> 6.704 s] > >> [INFO] Drill : Metastore : Mongo .. SUCCESS [ > >> 3.621 s] > >> [INFO] Drill : Contrib : Storage : Kudu ... SUCCESS [ > >> 6.693 s] > >> [INFO] Drill : Contrib : Format : XML . SUCCESS [ > >> 3.511 s] > >> [INFO] Drill : Contrib : Storage : HTTP ... SUCCESS [ > >> 5.195 s] > >> [INFO] Drill : Contrib : Storage : OpenTSDB ... SUCCESS [ > >> 3.561 s] > >> [INFO] Drill : Contrib : Storage : MongoDB SUCCESS [ > >> 4.850 s] > >> [INFO] Drill : Contrib : Storage : HBase ..