[jira] [Comment Edited] (DRILL-7529) Building depends on poorly configured uncommon maven repositories

2020-01-16 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017540#comment-17017540
 ] 

Charles Givre edited comment on DRILL-7529 at 1/16/20 10:45 PM:


Hi [~nielsbasjes]

I don't know if you saw, but we actually included this UDF in Drill 1.17.   
[Pull Request|https://github.com/apache/drill/pull/1840]

[Jira|https://issues.apache.org/jira/browse/DRILL-7343]

You are correct re: the Calcite fork.  This is a long standing issue with 
Drill.  I think there are some PRs which Drill needs that have not been 
accepted by the Calcite community.  I don't know the whole history of it. 

 

 


was (Author: cgivre):
Hi [~nielsbasjes]

I don't know if you saw, but we actually included this UDF in Drill 1.17.  
([https://issues.apache.org/jira/browse/DRILL-7343) 
(https://github.com/apache/drill/pull/1840).|https://github.com/apache/drill/pull/1840)]

You are correct re: the Calcite fork.  This is a long standing issue with 
Drill.  I think there are some PRs which Drill needs that have not been 
accepted by the Calcite community.  I don't know the whole history of it. 

 

 

> Building depends on poorly configured uncommon maven repositories
> -
>
> Key: DRILL-7529
> URL: https://issues.apache.org/jira/browse/DRILL-7529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Priority: Blocker
>
> *Summary: Apache Drill depends on modified copies of open software that is 
> hosted on non-standard (company owned) maven repositories. In addition due to 
> poor security configuration maven simply refuses to download the artifacts 
> from one of those.* 
> That is why I tagged this a blocker.
>  
> I have an open source project where I include a Drill UDF so other can use 
> this in Drill. ( [https://yauaa.basjes.nl/UDF-ApacheDrill.html] ).
> Today I tried to update the drill dependency from 1.16 to 1.17
> Resulting in 
> {{[ERROR] Failed to execute goal on project yauaa-drill: Could not resolve 
> dependencies for project 
> nl.basjes.parse.useragent:yauaa-drill:jar:5.15-SNAPSHOT: The following 
> artifacts could not be resolved: 
> com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2, 
> org.kohsuke:libpam4j:jar:1.8-rev2: Failure to find 
> com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2 in 
> https://oss.sonatype.org/content/repositories/snapshots/ was cached in the 
> local repository, resolution will not be reattempted until the update 
> interval of Sonatype snapshots has elapsed or updates are forced -> [Help 1]}}
> Turns out that 
> {{com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2}} is 
> (most likely) based here [https://github.com/vvysotskyi/drill-calcite/]
> Apparently this is a patched version of Calcite that is hosted under a 
> personal account but IS an important dependency of a released version of 
> Drill. (Side question: Why not simply improve calcite with these changes?)
> It took some digging and I found this one on this non standard maven 
> repository operated by a commercial company:
> [https://repository.mulesoft.org/nexus/content/repositories/public/]
>  
> The second dependency it failed over was even worse. 
>  {{org.kohsuke:libpam4j:jar:1.8-rev2}}
> This project IS present in maven central but NOT this specific version.
> [https://search.maven.org/artifact/org.kohsuke/libpam4j]
> The only place I have found this is here
> [https://repository.mapr.com/nexus/content/groups/mapr-public/]
> I did not encounter the sourcecode "github" for this modified version yet.
> So effectively I was forced to include two "company" repos in my project to 
> get it to build ... so you would think.
>  
> With these two in my pom.xml I got a new error which was much more 
> disturbing...
> {{[ERROR] Failed to execute goal on project yauaa-drill: Could not resolve 
> dependencies for project 
> nl.basjes.parse.useragent:yauaa-drill:jar:5.15-SNAPSHOT: Failed to collect 
> dependencies at org.apache.drill.exec:drill-java-exec:jar:1.17.0 -> 
> org.kohsuke:libpam4j:jar:1.8-rev2: Failed to read artifact descriptor for 
> org.kohsuke:libpam4j:jar:1.8-rev2: Could not transfer artifact 
> org.kohsuke:libpam4j:pom:1.8-rev2 from/to MapR 
> (https://repository.mapr.com/nexus/content/groups/mapr-public/): 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target -> [Help 1]}}
> This error essentially means: Maven does not trust the certificate path of 
> the provided HTTPS so it refused to download the artifact.
> Why? This MapR site uses a wildcard certificate issues by 

[jira] [Commented] (DRILL-7529) Building depends on poorly configured uncommon maven repositories

2020-01-16 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017540#comment-17017540
 ] 

Charles Givre commented on DRILL-7529:
--

Hi [~nielsbasjes]

I don't know if you saw, but we actually included this UDF in Drill 1.17.  
([https://issues.apache.org/jira/browse/DRILL-7343) 
(https://github.com/apache/drill/pull/1840).|https://github.com/apache/drill/pull/1840)]

You are correct re: the Calcite fork.  This is a long standing issue with 
Drill.  I think there are some PRs which Drill needs that have not been 
accepted by the Calcite community.  I don't know the whole history of it. 

 

 

> Building depends on poorly configured uncommon maven repositories
> -
>
> Key: DRILL-7529
> URL: https://issues.apache.org/jira/browse/DRILL-7529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Priority: Blocker
>
> *Summary: Apache Drill depends on modified copies of open software that is 
> hosted on non-standard (company owned) maven repositories. In addition due to 
> poor security configuration maven simply refuses to download the artifacts 
> from one of those.* 
> That is why I tagged this a blocker.
>  
> I have an open source project where I include a Drill UDF so other can use 
> this in Drill. ( [https://yauaa.basjes.nl/UDF-ApacheDrill.html] ).
> Today I tried to update the drill dependency from 1.16 to 1.17
> Resulting in 
> {{[ERROR] Failed to execute goal on project yauaa-drill: Could not resolve 
> dependencies for project 
> nl.basjes.parse.useragent:yauaa-drill:jar:5.15-SNAPSHOT: The following 
> artifacts could not be resolved: 
> com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2, 
> org.kohsuke:libpam4j:jar:1.8-rev2: Failure to find 
> com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2 in 
> https://oss.sonatype.org/content/repositories/snapshots/ was cached in the 
> local repository, resolution will not be reattempted until the update 
> interval of Sonatype snapshots has elapsed or updates are forced -> [Help 1]}}
> Turns out that 
> {{com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2}} is 
> (most likely) based here [https://github.com/vvysotskyi/drill-calcite/]
> Apparently this is a patched version of Calcite that is hosted under a 
> personal account but IS an important dependency of a released version of 
> Drill. (Side question: Why not simply improve calcite with these changes?)
> It took some digging and I found this one on this non standard maven 
> repository operated by a commercial company:
> [https://repository.mulesoft.org/nexus/content/repositories/public/]
>  
> The second dependency it failed over was even worse. 
>  {{org.kohsuke:libpam4j:jar:1.8-rev2}}
> This project IS present in maven central but NOT this specific version.
> [https://search.maven.org/artifact/org.kohsuke/libpam4j]
> The only place I have found this is here
> [https://repository.mapr.com/nexus/content/groups/mapr-public/]
> I did not encounter the sourcecode "github" for this modified version yet.
> So effectively I was forced to include two "company" repos in my project to 
> get it to build ... so you would think.
>  
> With these two in my pom.xml I got a new error which was much more 
> disturbing...
> {{[ERROR] Failed to execute goal on project yauaa-drill: Could not resolve 
> dependencies for project 
> nl.basjes.parse.useragent:yauaa-drill:jar:5.15-SNAPSHOT: Failed to collect 
> dependencies at org.apache.drill.exec:drill-java-exec:jar:1.17.0 -> 
> org.kohsuke:libpam4j:jar:1.8-rev2: Failed to read artifact descriptor for 
> org.kohsuke:libpam4j:jar:1.8-rev2: Could not transfer artifact 
> org.kohsuke:libpam4j:pom:1.8-rev2 from/to MapR 
> (https://repository.mapr.com/nexus/content/groups/mapr-public/): 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target -> [Help 1]}}
> This error essentially means: Maven does not trust the certificate path of 
> the provided HTTPS so it refused to download the artifact.
> Why? This MapR site uses a wildcard certificate issues by GoDaddy.
> Apparently this is a "well known" problem with these GoDaddy certificates: 
> [https://tozny.com/blog/godaddys-ssl-certs-dont-work-in-java-the-right-solution/]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7529) Building depends on poorly configured uncommon maven repositories

2020-01-16 Thread Niels Basjes (Jira)
Niels Basjes created DRILL-7529:
---

 Summary: Building depends on poorly configured uncommon maven 
repositories
 Key: DRILL-7529
 URL: https://issues.apache.org/jira/browse/DRILL-7529
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.17.0
Reporter: Niels Basjes


*Summary: Apache Drill depends on modified copies of open software that is 
hosted on non-standard (company owned) maven repositories. In addition due to 
poor security configuration maven simply refuses to download the artifacts from 
one of those.* 
That is why I tagged this a blocker.

 

I have an open source project where I include a Drill UDF so other can use this 
in Drill. ( [https://yauaa.basjes.nl/UDF-ApacheDrill.html] ).

Today I tried to update the drill dependency from 1.16 to 1.17

Resulting in 

{{[ERROR] Failed to execute goal on project yauaa-drill: Could not resolve 
dependencies for project 
nl.basjes.parse.useragent:yauaa-drill:jar:5.15-SNAPSHOT: The following 
artifacts could not be resolved: 
com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2, 
org.kohsuke:libpam4j:jar:1.8-rev2: Failure to find 
com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2 in 
https://oss.sonatype.org/content/repositories/snapshots/ was cached in the 
local repository, resolution will not be reattempted until the update interval 
of Sonatype snapshots has elapsed or updates are forced -> [Help 1]}}

Turns out that 
{{com.github.vvysotskyi.drill-calcite:calcite-core:jar:1.20.0-drill-r2}} is 
(most likely) based here [https://github.com/vvysotskyi/drill-calcite/]

Apparently this is a patched version of Calcite that is hosted under a personal 
account but IS an important dependency of a released version of Drill. (Side 
question: Why not simply improve calcite with these changes?)

It took some digging and I found this one on this non standard maven repository 
operated by a commercial company:

[https://repository.mulesoft.org/nexus/content/repositories/public/]

 

The second dependency it failed over was even worse. 

 {{org.kohsuke:libpam4j:jar:1.8-rev2}}

This project IS present in maven central but NOT this specific version.

[https://search.maven.org/artifact/org.kohsuke/libpam4j]

The only place I have found this is here

[https://repository.mapr.com/nexus/content/groups/mapr-public/]

I did not encounter the sourcecode "github" for this modified version yet.

So effectively I was forced to include two "company" repos in my project to get 
it to build ... so you would think.

 

With these two in my pom.xml I got a new error which was much more disturbing...

{{[ERROR] Failed to execute goal on project yauaa-drill: Could not resolve 
dependencies for project 
nl.basjes.parse.useragent:yauaa-drill:jar:5.15-SNAPSHOT: Failed to collect 
dependencies at org.apache.drill.exec:drill-java-exec:jar:1.17.0 -> 
org.kohsuke:libpam4j:jar:1.8-rev2: Failed to read artifact descriptor for 
org.kohsuke:libpam4j:jar:1.8-rev2: Could not transfer artifact 
org.kohsuke:libpam4j:pom:1.8-rev2 from/to MapR 
(https://repository.mapr.com/nexus/content/groups/mapr-public/): 
sun.security.validator.ValidatorException: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target -> [Help 1]}}

This error essentially means: Maven does not trust the certificate path of the 
provided HTTPS so it refused to download the artifact.

Why? This MapR site uses a wildcard certificate issues by GoDaddy.

Apparently this is a "well known" problem with these GoDaddy certificates: 
[https://tozny.com/blog/godaddys-ssl-certs-dont-work-in-java-the-right-solution/]

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017038#comment-17017038
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575205541
 
 
   Got it. `TestHDF5Utils` now extends `BaseTest` and I removed the extra lines 
from the other class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM table(dfs.test.`dset.h5` (type => 'hdf5', 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017036#comment-17017036
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575205083
 
 
   @cgivre you only need to add extends for `TestHDF5Utils`, `BaseTestQuery` 
and `ClusterTest` already incorporate such change.
   By the way, BaseTestQuery`  is deprecated, we should always use 
`ClusterTest`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017037#comment-17017037
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575205083
 
 
   @cgivre you only need to add extends for `TestHDF5Utils`, `BaseTestQuery` 
and `ClusterTest` already incorporate such change.
   By the way, `BaseTestQuery`  is deprecated, we should always use 
`ClusterTest`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017035#comment-17017035
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575205083
 
 
   @cgivre you only need to add extends for `TestHDF5Utils`, `BaseTestQuery` 
and `ClusterTests` already incorporate such change.
   By the way, BaseTestQuery`  is deprecated, we should always use 
`ClusterTest`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017033#comment-17017033
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on pull request #1778: DRILL-7233: Format Plugin for 
HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r367482096
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,943 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.util.GuavaPatcher;
+import org.apache.drill.common.util.ProtobufPatcher;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+
+ProtobufPatcher.patch();
 
 Review comment:
   No need to add these stuff here, it's already covered by `ClusterTest`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017030#comment-17017030
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575202878
 
 
   I think I misunderstood you, and I forgot about the second test class.  For 
`TestHDF5Utils` it now extends `BaseTest` so we're good there. 
   
   For the `TestHDF5Format` I added the two lines below to the `setup()` method.
   
   ```
   ProtobufPatcher.patch();
   GuavaPatcher.patch();
   ```
   
   Is this what you were looking for?  In the future, should classes like this 
extend `BaseTestQuery` rather than `ClusterTest`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017027#comment-17017027
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on pull request #1778: DRILL-7233: Format Plugin for 
HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r367477395
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Utils.java
 ##
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+public class TestHDF5Utils {
 
 Review comment:
   Please check one more time, I don't see extends here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017023#comment-17017023
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r367475317
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Utils.java
 ##
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+public class TestHDF5Utils {
 
 Review comment:
   Done.  I messed with the unsigned int issue and I'm remembering that there 
is an issue with the underlying C and Java HDF5 libraries. 
   
   Commits squashed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016938#comment-17016938
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on pull request #1778: DRILL-7233: Format Plugin for 
HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r367422285
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Utils.java
 ##
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+public class TestHDF5Utils {
 
 Review comment:
   @cgivre forgot one thing.
   All Drill test must extend `BaseTest` since it provides guava / protobuf 
patching to avoid premature classloding when running tests in parallel.
   For tests that extend `ClusterTest`, it's already one in this class parents 
but for `true` unit test we need to add this manually. Could you please fix? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016939#comment-17016939
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on pull request #1778: DRILL-7233: Format Plugin for 
HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r367422285
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Utils.java
 ##
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+public class TestHDF5Utils {
 
 Review comment:
   @cgivre forgot one thing.
   All Drill tests must extend `BaseTest` since it provides guava / protobuf 
patching to avoid premature classloding when running tests in parallel.
   For tests that extend `ClusterTest`, it's already one in this class parents 
but for `true` unit test we need to add this manually. Could you please fix? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016892#comment-17016892
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575140857
 
 
   Hi Arina
   It’s been a while since I originally wrote all that but there was a reason 
which I am not remembering now.  I will create a separate PR for that if I can 
figure out how to add the functionality.  
   
   Sent from my iPhone
   
   > On Jan 16, 2020, at 07:23, Arina Ielchiieva  
wrote:
   > 
   > 
   > @cgivre looking at the limitations, I am just wondering what is the 
problem with Drill that it cannot read Drill cannot read unsigned 64 bit 
integers.?
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather 

[jira] [Assigned] (DRILL-5405) Add missing operator types without dependency on protobuf enum

2020-01-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5405:
---

Assignee: (was: Arina Ielchiieva)

> Add missing operator types without dependency on protobuf enum
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Source - https://github.com/apache/drill/pull/214#discussion_r42687575
> When operators are not added into CoreOperatorType enum, they are displayed 
> on Web UI as UNKNOWN_OPERATOR.
> Screenshots:
> now -> unknown_operator.JPG
> should be -> maprdb_sub_scan.JPG
> Though plugins work fine without this changes, it's still would be nice to 
> display operator type on Web UI.
> But dependency on protobuf enum removes the ability for format plugins to 
> truly be drop-in pluggable.
> In this case as option we may need to refactor this interface to just return 
> a String and not use the protobuf enum, or have a separate "notes" field if 
> we want to have a generic filescan operator ID in this enum with extra data 
> to say what kind of scan we used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016877#comment-17016877
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575128260
 
 
   @cgivre looking at the limitations, I am just wondering what is the problem 
with Drill that cannot read unsigned 64 bit integers? Is this specify for this 
format? Or overall Drill problem?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016876#comment-17016876
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575128260
 
 
   @cgivre looking at the limitations, I am just wondering what is the problem 
with`Drill cannot read unsigned 64 bit integers.`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM 

[jira] [Updated] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7233:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM table(dfs.test.`dset.h5` (type => 'hdf5', 
> defaultPath => '/dset'));
> +---+---+---+---+---+---+
> | int_col_0 | int_col_1 | int_col_2 | int_col_3 | int_col_4 | int_col_5 |
> +---+---+---+---+---+---+
> | 1 | 2 | 3 | 4 | 5 | 6 |
> | 7 | 8 | 9 | 10| 11| 12|
> | 13| 14| 15| 16| 17| 18|
> | 19| 20| 21| 22| 23 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016864#comment-17016864
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575128260
 
 
   @cgivre looking at the limitations, I am just wondering what is the problem 
with Drill that it cannot read `Drill cannot read unsigned 64 bit integers.`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016859#comment-17016859
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-575127958
 
 
   +1, LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM table(dfs.test.`dset.h5` (type => 'hdf5', 
> defaultPath => '/dset'));
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-16 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016797#comment-17016797
 ] 

benj commented on DRILL-7449:
-

hi [~IhorHuzenko], I have enable debug logging and the result is here:  
[^embedded_sqlline_with_enable_debug_logging.log.txt]

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> 

[jira] [Updated] (DRILL-7449) memory leak parse_url function

2020-01-16 Thread benj (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj updated DRILL-7449:

Attachment: embedded_sqlline_with_enable_debug_logging.log.txt

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
>