[jira] [Updated] (DRILL-7705) Update jQuery and Bootstrap libraries

2020-04-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7705:

Reviewer: Vova Vysotskyi

> Update jQuery and Bootstrap libraries
> -
>
> Key: DRILL-7705
> URL: https://issues.apache.org/jira/browse/DRILL-7705
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.18.0
>
>
> There are some vulnerabilities present in jQuery and Bootstrap libraries used 
> in Drill:
> * jQuery before 3.4.0, as used in Drupal, Backdrop CMS, and other products, 
> mishandles jQuery.extend(true, {}, ...) because of Object.prototype 
> pollution. If an unsanitized source object contained an enumerable __proto__ 
> property, it could extend the native Object.prototype.
> * In Bootstrap before 4.1.2, XSS is possible in the collapse data-parent 
> attribute.
> * In Bootstrap before 4.1.2, XSS is possible in the data-container property 
> of tooltip.
> * In Bootstrap before 3.4.0, XSS is possible in the affix configuration 
> target property.
> * In Bootstrap before 3.4.1 and 4.3.x before 4.3.1, XSS is possible in the 
> tooltip or popover data-template attribute.
> The following update is suggested to fix them:
> * jQuery: 3.2.1 -> 3.5.0
> * Bootstrap: 3.1.1 -> 4.4.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7710) Fix TestMetastoreCommands#testDefaultSegment test

2020-04-21 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7710:

Reviewer: Arina Ielchiieva

> Fix TestMetastoreCommands#testDefaultSegment test
> -
>
> Key: DRILL-7710
> URL: https://issues.apache.org/jira/browse/DRILL-7710
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Test {{TestMetastoreCommands#testDefaultSegment}} sometimes fails:
> {noformat}
> [ERROR]   TestMetastoreCommands.testDefaultSegment:1870 
> expected: column=null,
> path=/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587168639522-0/multilevel/parquet/1994/Q1,
> partitionValues=null,
> locations=[/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587168639522-0/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
> tableInfo=TableInfo[storagePlugin=dfs, workspace=tmp, 
> name=multilevel/parquet/1994/Q1, type=null, owner=null],
> metadataInfo=MetadataInfo[type=SEGMENT, key=DEFAULT_SEGMENT, identifier=null],
> schema=null,
> columnsStatistics={},
> metadataStatistics={},
> lastModifiedTime=1587167178000]> but was: column=null,
> path=/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587125640835-0/multilevel/parquet/1994/Q1,
> partitionValues=null,
> locations=[/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587125640835-0/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
> tableInfo=TableInfo[storagePlugin=dfs, workspace=tmp, 
> name=multilevel/parquet/1994/Q1, type=null, owner=null],
> metadataInfo=MetadataInfo[type=SEGMENT, key=DEFAULT_SEGMENT, 
> identifier=DEFAULT_SEGMENT],
> schema=null,
> columnsStatistics={},
> metadataStatistics={},
> lastModifiedTime=1587124183000]>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7710) Fix TestMetastoreCommands#testDefaultSegment test

2020-04-21 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7710:

Labels: ready-to-commit  (was: )

> Fix TestMetastoreCommands#testDefaultSegment test
> -
>
> Key: DRILL-7710
> URL: https://issues.apache.org/jira/browse/DRILL-7710
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Test {{TestMetastoreCommands#testDefaultSegment}} sometimes fails:
> {noformat}
> [ERROR]   TestMetastoreCommands.testDefaultSegment:1870 
> expected: column=null,
> path=/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587168639522-0/multilevel/parquet/1994/Q1,
> partitionValues=null,
> locations=[/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587168639522-0/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
> tableInfo=TableInfo[storagePlugin=dfs, workspace=tmp, 
> name=multilevel/parquet/1994/Q1, type=null, owner=null],
> metadataInfo=MetadataInfo[type=SEGMENT, key=DEFAULT_SEGMENT, identifier=null],
> schema=null,
> columnsStatistics={},
> metadataStatistics={},
> lastModifiedTime=1587167178000]> but was: column=null,
> path=/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587125640835-0/multilevel/parquet/1994/Q1,
> partitionValues=null,
> locations=[/root/drillAutomation/builds/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1587125640835-0/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
> tableInfo=TableInfo[storagePlugin=dfs, workspace=tmp, 
> name=multilevel/parquet/1994/Q1, type=null, owner=null],
> metadataInfo=MetadataInfo[type=SEGMENT, key=DEFAULT_SEGMENT, 
> identifier=DEFAULT_SEGMENT],
> schema=null,
> columnsStatistics={},
> metadataStatistics={},
> lastModifiedTime=1587124183000]>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7706) Drill RDBMS Metastore

2020-04-21 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7706:

Labels: ready-to-commit  (was: )

> Drill RDBMS Metastore
> -
>
> Key: DRILL-7706
> URL: https://issues.apache.org/jira/browse/DRILL-7706
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Currently Drill has only one Metastore implementation based on Iceberg 
> tables. Iceberg tables are file based storage that supports concurrent writes 
> / reads but required to be placed on distributed file system. 
> This Jira aims to implement Drill RDBMS Metastore which will store Drill 
> Metastore metadata in the database of the user's choice. Currently, 
> PostgreSQL and MySQL databases are supported, others might work as well but 
> no testing was done. Also out of box for demonstration / testing purposes 
> Drill will setup SQLite file based embedded database but this is only 
> applicable for Drill in embedded mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7713) Upgrade misc libraries which outdated versions have reported vulnerabilities

2020-04-20 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7713:

Description: 
List of libraries to update:
1. Jackson
2. Retrofit
3. Commons-beanutils
4. Xalan
5. Xerdes
6. Commons-codec
7. Snakeyaml
8. Metadata-extractor
9. Protostuff

  was:
List of libraries to update:
|commons-beanutils-1.9.2.jar|
|jackson-databind-2.9.9.jar|
|xalan-2.7.1.jar|
|commons-compress-1.18.jar|
|metadata-extractor-2.11.0.jar|
|xercesImpl-2.11.0.jar|
|retrofit-2.1.0.jar|
|snakeyaml-1.23.jar|
|commons-codec-1.10.jar|


> Upgrade misc libraries which outdated versions have reported vulnerabilities
> 
>
> Key: DRILL-7713
> URL: https://issues.apache.org/jira/browse/DRILL-7713
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> List of libraries to update:
> 1. Jackson
> 2. Retrofit
> 3. Commons-beanutils
> 4. Xalan
> 5. Xerdes
> 6. Commons-codec
> 7. Snakeyaml
> 8. Metadata-extractor
> 9. Protostuff



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7713) Upgrade misc libraries which outdated versions have reported vulnerabilities

2020-04-20 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7713:

Reviewer: Vova Vysotskyi

> Upgrade misc libraries which outdated versions have reported vulnerabilities
> 
>
> Key: DRILL-7713
> URL: https://issues.apache.org/jira/browse/DRILL-7713
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> List of libraries to update:
> |commons-beanutils-1.9.2.jar|
> |jackson-databind-2.9.9.jar|
> |xalan-2.7.1.jar|
> |commons-compress-1.18.jar|
> |metadata-extractor-2.11.0.jar|
> |xercesImpl-2.11.0.jar|
> |retrofit-2.1.0.jar|
> |snakeyaml-1.23.jar|
> |commons-codec-1.10.jar|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7713) Upgrade misc libraries which outdated versions have reported vulnerabilities

2020-04-20 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7713:
---

 Summary: Upgrade misc libraries which outdated versions have 
reported vulnerabilities
 Key: DRILL-7713
 URL: https://issues.apache.org/jira/browse/DRILL-7713
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.18.0


List of libraries to update:
|commons-beanutils-1.9.2.jar|
|jackson-databind-2.9.9.jar|
|xalan-2.7.1.jar|
|commons-compress-1.18.jar|
|metadata-extractor-2.11.0.jar|
|xercesImpl-2.11.0.jar|
|retrofit-2.1.0.jar|
|snakeyaml-1.23.jar|
|commons-codec-1.10.jar|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7711) Add data path, parameter filter pushdown to HTTP plugin

2020-04-20 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7711:

Labels: ready-to-commit  (was: )

> Add data path, parameter filter pushdown to HTTP plugin
> ---
>
> Key: DRILL-7711
> URL: https://issues.apache.org/jira/browse/DRILL-7711
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Add to the new HTTP plugin two new features:
>  * The ability to express a path to the data to avoid having to work with 
> complex message objects in SQL.
>  * The ability to specify HTTP parameters using filter push-downs from SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7711) Add data path, parameter filter pushdown to HTTP plugin

2020-04-19 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7711:

Reviewer: Arina Ielchiieva

> Add data path, parameter filter pushdown to HTTP plugin
> ---
>
> Key: DRILL-7711
> URL: https://issues.apache.org/jira/browse/DRILL-7711
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> Add to the new HTTP plugin two new features:
>  * The ability to express a path to the data to avoid having to work with 
> complex message objects in SQL.
>  * The ability to specify HTTP parameters using filter push-downs from SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7701) EVF V2 Scan Framework

2020-04-19 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7701:

Reviewer: Arina Ielchiieva

> EVF V2 Scan Framework
> -
>
> Key: DRILL-7701
> URL: https://issues.apache.org/jira/browse/DRILL-7701
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Scan framework for the "V2" EVF schema resolution committed in DRILL-7696.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7712) Fix issues after ZK upgrade

2020-04-19 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7712:
---

 Summary: Fix issues after ZK upgrade
 Key: DRILL-7712
 URL: https://issues.apache.org/jira/browse/DRILL-7712
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.18.0
Reporter: Arina Ielchiieva
Assignee: Vova Vysotskyi
 Fix For: 1.18.0


Warnings during jdbc-all build (absent when building with Mapr profile):
{noformat}
netty-transport-native-epoll-4.1.45.Final.jar, 
netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 46 
overlapping classes: 
  - io.netty.channel.epoll.AbstractEpollStreamChannel$2
  - io.netty.channel.epoll.AbstractEpollServerChannel$EpollServerSocketUnsafe
  - io.netty.channel.epoll.EpollDatagramChannel
  - io.netty.channel.epoll.AbstractEpollStreamChannel$SpliceInChannelTask
  - io.netty.channel.epoll.NativeDatagramPacketArray
  - io.netty.channel.epoll.EpollSocketChannelConfig
  - io.netty.channel.epoll.EpollTcpInfo
  - io.netty.channel.epoll.EpollEventArray
  - io.netty.channel.epoll.EpollEventLoop
  - io.netty.channel.epoll.EpollSocketChannel
  - 36 more...
netty-transport-native-unix-common-4.1.45.Final.jar, 
netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 15 
overlapping classes: 
  - io.netty.channel.unix.Errors$NativeConnectException
  - io.netty.channel.unix.ServerDomainSocketChannel
  - io.netty.channel.unix.DomainSocketAddress
  - io.netty.channel.unix.Socket
  - io.netty.channel.unix.NativeInetAddress
  - io.netty.channel.unix.DomainSocketChannelConfig
  - io.netty.channel.unix.Errors$NativeIoException
  - io.netty.channel.unix.DomainSocketReadMode
  - io.netty.channel.unix.ErrorsStaticallyReferencedJniMethods
  - io.netty.channel.unix.UnixChannel
  - 5 more...
maven-shade-plugin has detected that some class files are
present in two or more JARs. When this happens, only one
single version of the class is copied to the uber jar.
Usually this is not harmful and you can skip these warnings,
otherwise try to manually exclude artifacts based on
mvn dependency:tree -Ddetail=true and the above output.
See http://maven.apache.org/plugins/maven-shade-plugin/
{noformat}


Additional warning build with Mapr profile:
{noformat}
The following patterns were never triggered in this artifact inclusion filter:
o  'org.apache.zookeeper:zookeeper-jute'
{noformat}

NPEs in tests (though tests do not fail):
{noformat}
[INFO] Running org.apache.drill.exec.coord.zk.TestZookeeperClient
4880
java.lang.NullPointerException
4881
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
4882
at 
org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
4883
at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
4884
at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
4885
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:
{noformat}

{noformat}
[INFO] Running org.apache.drill.exec.coord.zk.TestEphemeralStore
5278
java.lang.NullPointerException
5279
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
5280
at org.apache.zookeepe
{noformat}

{noformat}
[INFO] Running org.apache.drill.yarn.zk.TestAmRegistration
6767
java.lang.NullPointerException
6768
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
6769
at 
org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
6770
at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
6771
at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
6772
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:929)
6773
at org.apache.curator.t
{noformat}

{noformat}
org.apache.drill.yarn.client.TestCommandLineOptions
6823
java.lang.NullPointerException
6824
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
6825
at 
org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
6826
at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
6827
at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
6828
at org.apac
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-6168) Table functions do not "inherit" default configuration

2020-04-19 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6168:

Labels: ready-to-commit  (was: )

> Table functions do not "inherit" default configuration
> --
>
> Key: DRILL-6168
> URL: https://issues.apache.org/jira/browse/DRILL-6168
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> See DRILL-6167 that describes an attempt to use a table function with a regex 
> format plugin.
> Consider the plugin configuration:
> {code}
> RegexFormatConfig sampleConfig = new RegexFormatConfig();
> sampleConfig.extension = "log1";
> sampleConfig.regex = DATE_ONLY_PATTERN;
> sampleConfig.fields = Lists.newArrayList("year", "month", "day");
> {code}
> (This plugin is defined in code in a test rather than the usual JSON in the 
> Web console.)
> Run a test with the above. Things work fine.
> Now, try the plugin config with a table function as described in DRILL-6167:
> {code}
>   String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" +
>   "(type => 'regex', regex => 
> '(dddd)-(dd)-(dd) .*'))";
>   client.queryBuilder().sql(sql).printCsv();
> {code}
> Because we are using a file with suffix "log2", the query will match the 
> format plugin config defined above. A query without the table function does, 
> in fact, work using the defined config. But, with a table function, we get 
> this warning from our regex code:
> {noformat}
> 13307 WARN [257590e1-e846-9d82-61d4-e246a4925ac3:frag:0:0] 
> [org.apache.drill.exec.store.easy.regex.RegexRecordReader] - Column list has 
> fewer
>   names than the pattern has groups, filling extras with Column$n.
> {noformat}
> (The warning is in the custom plugin, not Drill.) This is the plugin saying, 
> "hey! you didn't provide column names!". But, in the format definition, we 
> did provide names. If we run the query without a table function, we do see 
> those names used.
> Result:
> {noformat}
> 3 row(s):
> Column$0,Column$1,Column$2
> 2017,12,17
> 2017,12,18
> 2017,12,19
> Total rows returned : 3.  Returned in 9072ms.
> {noformat}
> Yes, indeed, the table function discarded the defined format config values, 
> filling in blanks, including for the column names.
> The expected behavior is that all properties defined in the config should 
> remain unchanged _except_ for those in the table function. Why? In order to 
> know which format plugin to use, the code has to map from the suffix (".log2" 
> here) to a format plugin _config_. (The config is the only thing that 
> specifies a suffix.) Since we mapped to a config (not the unconfigured 
> plugin), we'd expect the config properties to be used.
> It is highly surprising that all we get to use is the suffix, but all other 
> attributes are ignored. This seems very much in the "bug" category and not at 
> all in the "feature" category.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7701) EVF V2 Scan Framework

2020-04-19 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7701:

Labels: ready-to-commit  (was: )

> EVF V2 Scan Framework
> -
>
> Key: DRILL-7701
> URL: https://issues.apache.org/jira/browse/DRILL-7701
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Scan framework for the "V2" EVF schema resolution committed in DRILL-7696.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7710) Fix TestMetastoreCommands#testDefaultSegment test

2020-04-18 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7710:
---

 Summary: Fix TestMetastoreCommands#testDefaultSegment test
 Key: DRILL-7710
 URL: https://issues.apache.org/jira/browse/DRILL-7710
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Vova Vysotskyi
 Fix For: 1.18.0


Test {{TestMetastoreCommands#testDefaultSegment}} sometimes fails:

{noformat}
[ERROR]   TestMetastoreCommands.testDefaultSegment:1870 
expected: but was:
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7706) Drill RDBMS Metastore

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7706:

Reviewer: Vova Vysotskyi

> Drill RDBMS Metastore
> -
>
> Key: DRILL-7706
> URL: https://issues.apache.org/jira/browse/DRILL-7706
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently Drill has only one Metastore implementation based on Iceberg 
> tables. Iceberg tables are file based storage that supports concurrent writes 
> / reads but required to be placed on distributed file system. 
> This Jira aims to implement Drill RDBMS Metastore which will store Drill 
> Metastore metadata in the database of the user's choice. Currently, 
> PostgreSQL and MySQL databases are supported, others might work as well but 
> no testing was done. Also out of box for demonstration / testing purposes 
> Drill will setup SQLite file based embedded database but this is only 
> applicable for Drill in embedded mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7707) Unable to analyze table metadata is it resides in non-writable workspace

2020-04-17 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085793#comment-17085793
 ] 

Arina Ielchiieva commented on DRILL-7707:
-

Similar issue when trying to execute analyze command for that table without 
indicating schema first:
{noformat}
apache drill> analyze table t refresh metadata;
Error: VALIDATION ERROR: Root schema is immutable. Creating or dropping 
tables/views is not allowed in root schema.Select a schema using 'USE schema' 
command.
{noformat}
It should say that table is absent though.

> Unable to analyze table metadata is it resides in non-writable workspace
> 
>
> Key: DRILL-7707
> URL: https://issues.apache.org/jira/browse/DRILL-7707
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
>
> Unable to analyze table metadata is it resides in non-writable workspace:
> {noformat}
> apache drill> analyze table cp.`employee.json` refresh metadata;
> Error: VALIDATION ERROR: Unable to create or drop objects. Schema [cp] is 
> immutable.
> {noformat}
> Stacktrace:
> {noformat}
> [Error Id: b7f233cd-f090-491e-a487-5fc4c25444a4 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchemaInternal(SchemaUtilites.java:230)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchema(SchemaUtilites.java:208)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DrillTableInfo.getTableInfoHolder(DrillTableInfo.java:101)
>   at 
> org.apache.drill.exec.planner.sql.handlers.MetastoreAnalyzeTableHandler.getPlan(MetastoreAnalyzeTableHandler.java:108)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7707) Unable to analyze table metadata is it resides in non-writable workspace

2020-04-17 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7707:
---

 Summary: Unable to analyze table metadata is it resides in 
non-writable workspace
 Key: DRILL-7707
 URL: https://issues.apache.org/jira/browse/DRILL-7707
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva


Unable to analyze table metadata is it resides in non-writable workspace:

{noformat}
apache drill> analyze table cp.`employee.json` refresh metadata;
Error: VALIDATION ERROR: Unable to create or drop objects. Schema [cp] is 
immutable.
{noformat}

Stacktrace:
{noformat}
[Error Id: b7f233cd-f090-491e-a487-5fc4c25444a4 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
at 
org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchemaInternal(SchemaUtilites.java:230)
at 
org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchema(SchemaUtilites.java:208)
at 
org.apache.drill.exec.planner.sql.handlers.DrillTableInfo.getTableInfoHolder(DrillTableInfo.java:101)
at 
org.apache.drill.exec.planner.sql.handlers.MetastoreAnalyzeTableHandler.getPlan(MetastoreAnalyzeTableHandler.java:108)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7706) Drill RDBMS Metastore

2020-04-17 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7706:
---

 Summary: Drill RDBMS Metastore
 Key: DRILL-7706
 URL: https://issues.apache.org/jira/browse/DRILL-7706
 Project: Apache Drill
  Issue Type: New Feature
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.18.0


Currently Drill has only one Metastore implementation based on Iceberg tables. 
Iceberg tables are file based storage that supports concurrent writes / reads 
but required to be placed on distributed file system. 

This Jira aims to implement Drill RDBMS Metastore which will store Drill 
Metastore metadata in the database of the user's choice. Currently, PostgreSQL 
and MySQL databases are supported, others might work as well but no testing was 
done. Also out of box for demonstration / testing purposes Drill will setup 
SQLite file based embedded database but this is only applicable for Drill in 
embedded mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7703) Support for 3+D arrays in EVF JSON loader

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7703:

Labels: ready-to-commit  (was: )

> Support for 3+D arrays in EVF JSON loader
> -
>
> Key: DRILL-7703
> URL: https://issues.apache.org/jira/browse/DRILL-7703
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Add support for multiple levels of repeated list to the new EVF-based JSON 
> reader.
> As work continues on adding the new JSON reader to Drill, running unit tests 
> reveals that some include list with three (perhaps more) dimensions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7703) Support for 3+D arrays in EVF JSON loader

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7703:

Reviewer: Arina Ielchiieva

> Support for 3+D arrays in EVF JSON loader
> -
>
> Key: DRILL-7703
> URL: https://issues.apache.org/jira/browse/DRILL-7703
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Add support for multiple levels of repeated list to the new EVF-based JSON 
> reader.
> As work continues on adding the new JSON reader to Drill, running unit tests 
> reveals that some include list with three (perhaps more) dimensions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-6168) Table functions do not "inherit" default configuration

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6168:

Reviewer: Arina Ielchiieva

> Table functions do not "inherit" default configuration
> --
>
> Key: DRILL-6168
> URL: https://issues.apache.org/jira/browse/DRILL-6168
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-6167 that describes an attempt to use a table function with a regex 
> format plugin.
> Consider the plugin configuration:
> {code}
> RegexFormatConfig sampleConfig = new RegexFormatConfig();
> sampleConfig.extension = "log1";
> sampleConfig.regex = DATE_ONLY_PATTERN;
> sampleConfig.fields = Lists.newArrayList("year", "month", "day");
> {code}
> (This plugin is defined in code in a test rather than the usual JSON in the 
> Web console.)
> Run a test with the above. Things work fine.
> Now, try the plugin config with a table function as described in DRILL-6167:
> {code}
>   String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" +
>   "(type => 'regex', regex => 
> '(dddd)-(dd)-(dd) .*'))";
>   client.queryBuilder().sql(sql).printCsv();
> {code}
> Because we are using a file with suffix "log2", the query will match the 
> format plugin config defined above. A query without the table function does, 
> in fact, work using the defined config. But, with a table function, we get 
> this warning from our regex code:
> {noformat}
> 13307 WARN [257590e1-e846-9d82-61d4-e246a4925ac3:frag:0:0] 
> [org.apache.drill.exec.store.easy.regex.RegexRecordReader] - Column list has 
> fewer
>   names than the pattern has groups, filling extras with Column$n.
> {noformat}
> (The warning is in the custom plugin, not Drill.) This is the plugin saying, 
> "hey! you didn't provide column names!". But, in the format definition, we 
> did provide names. If we run the query without a table function, we do see 
> those names used.
> Result:
> {noformat}
> 3 row(s):
> Column$0,Column$1,Column$2
> 2017,12,17
> 2017,12,18
> 2017,12,19
> Total rows returned : 3.  Returned in 9072ms.
> {noformat}
> Yes, indeed, the table function discarded the defined format config values, 
> filling in blanks, including for the column names.
> The expected behavior is that all properties defined in the config should 
> remain unchanged _except_ for those in the table function. Why? In order to 
> know which format plugin to use, the code has to map from the suffix (".log2" 
> here) to a format plugin _config_. (The config is the only thing that 
> specifies a suffix.) Since we mapped to a config (not the unconfigured 
> plugin), we'd expect the config properties to be used.
> It is highly surprising that all we get to use is the suffix, but all other 
> attributes are ignored. This seems very much in the "bug" category and not at 
> all in the "feature" category.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-6168) Table functions do not "inherit" default configuration

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6168:

Fix Version/s: 1.18.0

> Table functions do not "inherit" default configuration
> --
>
> Key: DRILL-6168
> URL: https://issues.apache.org/jira/browse/DRILL-6168
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-6167 that describes an attempt to use a table function with a regex 
> format plugin.
> Consider the plugin configuration:
> {code}
> RegexFormatConfig sampleConfig = new RegexFormatConfig();
> sampleConfig.extension = "log1";
> sampleConfig.regex = DATE_ONLY_PATTERN;
> sampleConfig.fields = Lists.newArrayList("year", "month", "day");
> {code}
> (This plugin is defined in code in a test rather than the usual JSON in the 
> Web console.)
> Run a test with the above. Things work fine.
> Now, try the plugin config with a table function as described in DRILL-6167:
> {code}
>   String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" +
>   "(type => 'regex', regex => 
> '(dddd)-(dd)-(dd) .*'))";
>   client.queryBuilder().sql(sql).printCsv();
> {code}
> Because we are using a file with suffix "log2", the query will match the 
> format plugin config defined above. A query without the table function does, 
> in fact, work using the defined config. But, with a table function, we get 
> this warning from our regex code:
> {noformat}
> 13307 WARN [257590e1-e846-9d82-61d4-e246a4925ac3:frag:0:0] 
> [org.apache.drill.exec.store.easy.regex.RegexRecordReader] - Column list has 
> fewer
>   names than the pattern has groups, filling extras with Column$n.
> {noformat}
> (The warning is in the custom plugin, not Drill.) This is the plugin saying, 
> "hey! you didn't provide column names!". But, in the format definition, we 
> did provide names. If we run the query without a table function, we do see 
> those names used.
> Result:
> {noformat}
> 3 row(s):
> Column$0,Column$1,Column$2
> 2017,12,17
> 2017,12,18
> 2017,12,19
> Total rows returned : 3.  Returned in 9072ms.
> {noformat}
> Yes, indeed, the table function discarded the defined format config values, 
> filling in blanks, including for the column names.
> The expected behavior is that all properties defined in the config should 
> remain unchanged _except_ for those in the table function. Why? In order to 
> know which format plugin to use, the code has to map from the suffix (".log2" 
> here) to a format plugin _config_. (The config is the only thing that 
> specifies a suffix.) Since we mapped to a config (not the unconfigured 
> plugin), we'd expect the config properties to be used.
> It is highly surprising that all we get to use is the suffix, but all other 
> attributes are ignored. This seems very much in the "bug" category and not at 
> all in the "feature" category.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7704) Update Maven dependency to 3.6.3

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7704:

Summary: Update Maven dependency to 3.6.3  (was: Update Maven dependency)

> Update Maven dependency to 3.6.3
> 
>
> Key: DRILL-7704
> URL: https://issues.apache.org/jira/browse/DRILL-7704
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Currently, minimal Maven version in Drill is 3.3.3, it's old contains 
> dependency to the plexus-utils-3.0.20 library which has reported 
> vulnerabilities.
> This Jira aims to update Maven version to 3.6.3.
> Having latest maven version is also crucial when using maven plugins that 
> depend on the latest version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7704) Update Maven dependency

2020-04-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7704:

Reviewer: Vova Vysotskyi

> Update Maven dependency
> ---
>
> Key: DRILL-7704
> URL: https://issues.apache.org/jira/browse/DRILL-7704
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Currently, minimal Maven version in Drill is 3.3.3, it's old contains 
> dependency to the plexus-utils-3.0.20 library which has reported 
> vulnerabilities.
> This Jira aims to update Maven version to 3.6.3.
> Having latest maven version is also crucial when using maven plugins that 
> depend on the latest version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7704) Update Maven dependency

2020-04-16 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7704:
---

 Summary: Update Maven dependency
 Key: DRILL-7704
 URL: https://issues.apache.org/jira/browse/DRILL-7704
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.18.0


Currently, minimal Maven version in Drill is 3.3.3, it's old contains 
dependency to the plexus-utils-3.0.20 library which has reported 
vulnerabilities.

This Jira aims to update Maven version to 3.6.3.
Having latest maven version is also crucial when using maven plugins that 
depend on the latest version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7702) Update zookeeper, curator, guava, jetty-server, libthrift, httpclient, commons-compress and httpdlog-parser

2020-04-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7702:

Reviewer: Arina Ielchiieva

> Update zookeeper, curator, guava, jetty-server, libthrift, httpclient, 
> commons-compress and httpdlog-parser
> ---
>
> Key: DRILL-7702
> URL: https://issues.apache.org/jira/browse/DRILL-7702
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.18.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Update:
>  - zookeeper to 3.5.7 (3.6 is not supported by the latest curator yet);
>  - curator to 4.3.0;
>  - guava (shaded) to 28.2-jre;
>  - jetty-server to 9.3.28.v20191105;
>  - libthrift to 0.13.0;
>  - httpclient to 4.5.12;
>  - commons-compress to 1.20; 
>  - httpdlog-parser to 5.3;
>  - derby to 10.14.2.0.
> Exclude:
>  - commons-httpclient
>  - log4j-core
>  - jasper-compiler



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7702) Update zookeeper, curator, guava, jetty-server, libthrift, httpclient, commons-compress and httpdlog-parser

2020-04-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7702:

Labels: ready-to-commit  (was: )

> Update zookeeper, curator, guava, jetty-server, libthrift, httpclient, 
> commons-compress and httpdlog-parser
> ---
>
> Key: DRILL-7702
> URL: https://issues.apache.org/jira/browse/DRILL-7702
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.18.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Update:
>  - zookeeper to 3.5.7 (3.6 is not supported by the latest curator yet);
>  - curator to 4.3.0;
>  - guava (shaded) to 28.2-jre;
>  - jetty-server to 9.3.28.v20191105;
>  - libthrift to 0.13.0;
>  - httpclient to 4.5.12;
>  - commons-compress to 1.20; 
>  - httpdlog-parser to 5.3;
>  - derby to 10.14.2.0.
> Exclude:
>  - commons-httpclient
>  - log4j-core
>  - jasper-compiler



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7696) EVF v2 Scan Schema Resolution

2020-04-14 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7696:

Reviewer: Arina Ielchiieva

> EVF v2 Scan Schema Resolution
> -
>
> Key: DRILL-7696
> URL: https://issues.apache.org/jira/browse/DRILL-7696
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Revises the mechanism EVF uses to resolve the schema for a scan. See PR for 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7696) EVF v2 Scan Schema Resolution

2020-04-14 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7696:

Labels: ready-to-commit  (was: )

> EVF v2 Scan Schema Resolution
> -
>
> Key: DRILL-7696
> URL: https://issues.apache.org/jira/browse/DRILL-7696
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Revises the mechanism EVF uses to resolve the schema for a scan. See PR for 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-04-13 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7437:

Labels: ready-to-commit  (was: )

> Storage Plugin for Generic HTTP REST API
> 
>
> Key: DRILL-7437
> URL: https://issues.apache.org/jira/browse/DRILL-7437
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> In many data analytic situations there is a need to obtain reference data 
> which is volatile or hosted on a service with a REST API.  
> For instance, consider the case of a financial dataset which you want to run 
> a currency conversion.  Or in the security arena, an organization might have 
> a service that returns network information about an IT asset.  The goal being 
> to enable Drill to quickly incorporate external data that is only accessible 
> via REST API. 
> This plugin is not intended to be a substitute for dedicated storage plugins 
> with systems that use a REST API, such as Apache Solr or ElasticSearch.  
> This plugin is based on several projects that were posted on github but never 
> completed or submitted to Drill.  Posted here for attribution:
>  * [https://github.com/kevinlynx/drill-storage-http]
>  * [https://github.com/mayunSaicmotor/drill-storage-http]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-04-13 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7437:

Reviewer: Paul Rogers

> Storage Plugin for Generic HTTP REST API
> 
>
> Key: DRILL-7437
> URL: https://issues.apache.org/jira/browse/DRILL-7437
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> In many data analytic situations there is a need to obtain reference data 
> which is volatile or hosted on a service with a REST API.  
> For instance, consider the case of a financial dataset which you want to run 
> a currency conversion.  Or in the security arena, an organization might have 
> a service that returns network information about an IT asset.  The goal being 
> to enable Drill to quickly incorporate external data that is only accessible 
> via REST API. 
> This plugin is not intended to be a substitute for dedicated storage plugins 
> with systems that use a REST API, such as Apache Solr or ElasticSearch.  
> This plugin is based on several projects that were posted on github but never 
> completed or submitted to Drill.  Posted here for attribution:
>  * [https://github.com/kevinlynx/drill-storage-http]
>  * [https://github.com/mayunSaicmotor/drill-storage-http]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7603) Allow setting default schema using REST API / Web UI

2020-04-11 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7603:

Labels:   (was: ready-to-commit)

> Allow setting default schema using REST API / Web UI
> 
>
> Key: DRILL-7603
> URL: https://issues.apache.org/jira/browse/DRILL-7603
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.17.0
>Reporter: Dobes Vandermeer
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Currently, to set the default schema you must run the `USE ` command. 
>  However, the Web UI and REST API do not keep a session open so the `USE` 
> command does not affect the next query sent to the API.
> To support a default schema for REST API & Web UI I propose a parameter to 
> the API "defaultSchema" which sets the default schema for that query.
> Example: curl -d '{"query":"SHOW 
> FILES","defaultSchema":"dfs.tmp","queryType":"SQL"}' -H 'Content-Type: 
> application/json' -H 'User-Name: test' localhost:8047/query.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7694) Register drill.queries.* counter metrics on Drillbit startup

2020-04-11 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7694:

Reviewer: Arina Ielchiieva

> Register drill.queries.* counter metrics on Drillbit startup 
> -
>
> Key: DRILL-7694
> URL: https://issues.apache.org/jira/browse/DRILL-7694
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7694) Register drill.queries.* counter metrics on Drillbit startup

2020-04-11 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7694:

Affects Version/s: 1.17.0

> Register drill.queries.* counter metrics on Drillbit startup 
> -
>
> Key: DRILL-7694
> URL: https://issues.apache.org/jira/browse/DRILL-7694
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7694) Register drill.queries.* counter metrics on Drillbit startup

2020-04-11 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7694:

Fix Version/s: 1.18.0

> Register drill.queries.* counter metrics on Drillbit startup 
> -
>
> Key: DRILL-7694
> URL: https://issues.apache.org/jira/browse/DRILL-7694
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7694) Register drill.queries.* counter metrics on Drillbit startup

2020-04-11 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7694:

Labels: ready-to-commit  (was: )

> Register drill.queries.* counter metrics on Drillbit startup 
> -
>
> Key: DRILL-7694
> URL: https://issues.apache.org/jira/browse/DRILL-7694
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7692) Select default schema from enabled storage plugins in query page

2020-04-08 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7692:

Description: 
[https://github.com/apache/drill/pull/1996]

The above PR introduced an option to specify a default schema to find table 
names in the query page of the web UI.

Users specify a schema through an input field but this is error prone and 
doesn't restrict options to enabled storage plugins.

I propose that it be changed to a dropdown that lists enabled storages as 
options, and automatically defaults if there is only one available storage.

See attached screenshot for an example:
 

  was:
[https://github.com/apache/drill/pull/1996]

The above PR introduced an option to specify a default schema to find table 
names in the query page of the web UI.

*Assign to me*
Users specify a schema through an input field but this is error prone and 
doesn't restrict options to enabled storage plugins.

 I propose that it be changed to a dropdown that lists enabled storages as 
options, and automatically defaults if there is only one available storage.

See attached screenshot for an example:
 


> Select default schema from enabled storage plugins in query page
> 
>
> Key: DRILL-7692
> URL: https://issues.apache.org/jira/browse/DRILL-7692
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.18.0
>Reporter: Justin Chen
>Priority: Minor
> Attachments: Screenshot 2020-04-07 at 3.33.32 PM.png, Screenshot 
> 2020-04-07 at 3.33.37 PM.png
>
>
> [https://github.com/apache/drill/pull/1996]
> The above PR introduced an option to specify a default schema to find table 
> names in the query page of the web UI.
> Users specify a schema through an input field but this is error prone and 
> doesn't restrict options to enabled storage plugins.
> I propose that it be changed to a dropdown that lists enabled storages as 
> options, and automatically defaults if there is only one available storage.
> See attached screenshot for an example:
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7692) Select default schema from enabled storage plugins in query page

2020-04-08 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7692:

Labels:   (was: ready-to-commit)

> Select default schema from enabled storage plugins in query page
> 
>
> Key: DRILL-7692
> URL: https://issues.apache.org/jira/browse/DRILL-7692
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.18.0
>Reporter: Justin Chen
>Priority: Minor
> Attachments: Screenshot 2020-04-07 at 3.33.32 PM.png, Screenshot 
> 2020-04-07 at 3.33.37 PM.png
>
>
> [https://github.com/apache/drill/pull/1996]
> The above PR introduced an option to specify a default schema to find table 
> names in the query page of the web UI.
> *Assign to me*
> Users specify a schema through an input field but this is error prone and 
> doesn't restrict options to enabled storage plugins.
>  I propose that it be changed to a dropdown that lists enabled storages as 
> options, and automatically defaults if there is only one available storage.
> See attached screenshot for an example:
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7675) Very slow performance and Memory exhaustion while querying on very small dataset of parquet files

2020-04-07 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7675:

Labels: ready-to-commit  (was: )

> Very slow performance and Memory exhaustion while querying on very small 
> dataset of parquet files
> -
>
> Key: DRILL-7675
> URL: https://issues.apache.org/jira/browse/DRILL-7675
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.18.0
> Environment: [^sample-dataset.zip]
>Reporter: Idan Sheinberg
>Assignee: Paul Rogers
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
> Attachments: sample-dataset.zip
>
>
> Per our discussion in Slack/Dev-list Here are all details and sample data-set 
> to recreate problematic query behavior:
>  * We are using Drill 1.18.0-SNAPSHOT built on March 6
>  * We are joining on two small Parquet datasets residing on S3 using the 
> following query:
> {code:java}
> SELECT 
>  CASE
>  WHEN tbl1.`timestamp` IS NULL THEN tbl2.`timestamp`
>  ELSE tbl1.`timestamp`
>  END AS ts, *
>  FROM `s3-store.state.`/164` AS tbl1
>  FULL OUTER JOIN `s3-store.result`.`/164` AS tbl2
>  ON tbl1.`timestamp`*10 = tbl2.`timestamp`
>  ORDER BY ts ASC
>  LIMIT 500 OFFSET 0 ROWS
> {code}
>  * We are running drill in a single node setup on a 16 core, 64GB ram 
> machine. Drill heap size is set to 16GB, while max direct memory is set to 
> 32GB.
>  * As the dataset consist of really small files, Drill has been tweaked to 
> parallelize on small item count by tweaking the following variables:
> {code:java}
> planner.slice_target = 25
> planner.width.max_per_node = 16 (to match the core count){code}
>  * Without the above parallelization, query speeds on parquet files are super 
> slow (tens of seconds)
>  * While queries do work, we are seeing non-proportional direct memory/heap 
> utilization. (up 20GB of direct memory used, a min of 12GB heap required)
>  * We're still encountering the occasional OOM of memory error (we're also 
> seeing heap exhaustion, but I guess that's another indication to same 
> problem. Reducing the node parallelization width to say, 8, reduces memory 
> contention, though it still reaches 8 gb of direct memory 
> {code:java}
> User Error Occurred: One or more nodes ran out of memory while executing the 
> query. (null)
>  org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or 
> more nodes ran out of memory while executing the query.null[Error Id: 
> 67b61fc9-320f-47a1-8718-813843a10ecc ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
>  at 
> org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
>  at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
>  ... 4 common frames omitted{code}
> I've attached a (real!) sample data-set to match the query above. That 

[jira] [Assigned] (DRILL-7528) Update Avro format plugin documentation

2020-04-06 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7528:
---

Assignee: Vova Vysotskyi

> Update Avro format plugin documentation
> ---
>
> Key: DRILL-7528
> URL: https://issues.apache.org/jira/browse/DRILL-7528
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
>
> Currently documentation states that Avro plugin is experimental.
> As per Drill 1.17 / 1.18 it's code is pretty stable (since Drill 1.18 it uses 
> EVF).
> Documentation should be updated accordingly.
> https://drill.apache.org/docs/querying-avro-files/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (DRILL-7528) Update Avro format plugin documentation

2020-04-06 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7528.
-
Resolution: Fixed

Added in 
https://github.com/apache/drill/commit/0e17eea19aca27b88c98778fcfb7057a45501ab9.

> Update Avro format plugin documentation
> ---
>
> Key: DRILL-7528
> URL: https://issues.apache.org/jira/browse/DRILL-7528
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently documentation states that Avro plugin is experimental.
> As per Drill 1.17 / 1.18 it's code is pretty stable (since Drill 1.18 it uses 
> EVF).
> Documentation should be updated accordingly.
> https://drill.apache.org/docs/querying-avro-files/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7528) Update Avro format plugin documentation

2020-04-06 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7528:

Reviewer: Arina Ielchiieva

> Update Avro format plugin documentation
> ---
>
> Key: DRILL-7528
> URL: https://issues.apache.org/jira/browse/DRILL-7528
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently documentation states that Avro plugin is experimental.
> As per Drill 1.17 / 1.18 it's code is pretty stable (since Drill 1.18 it uses 
> EVF).
> Documentation should be updated accordingly.
> https://drill.apache.org/docs/querying-avro-files/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7528) Update Avro format plugin documentation

2020-04-06 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7528:

Component/s: Documentation

> Update Avro format plugin documentation
> ---
>
> Key: DRILL-7528
> URL: https://issues.apache.org/jira/browse/DRILL-7528
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently documentation states that Avro plugin is experimental.
> As per Drill 1.17 / 1.18 it's code is pretty stable (since Drill 1.18 it uses 
> EVF).
> Documentation should be updated accordingly.
> https://drill.apache.org/docs/querying-avro-files/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7528) Update Avro format plugin documentation

2020-04-06 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7528:

Fix Version/s: 1.18.0

> Update Avro format plugin documentation
> ---
>
> Key: DRILL-7528
> URL: https://issues.apache.org/jira/browse/DRILL-7528
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently documentation states that Avro plugin is experimental.
> As per Drill 1.17 / 1.18 it's code is pretty stable (since Drill 1.18 it uses 
> EVF).
> Documentation should be updated accordingly.
> https://drill.apache.org/docs/querying-avro-files/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7678) Update Yauaa dependency

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7678:
---

Assignee: Niels Basjes

> Update Yauaa dependency
> ---
>
> Key: DRILL-7678
> URL: https://issues.apache.org/jira/browse/DRILL-7678
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The Yauaa useragent parsing library has a new release.
> Also a few changes and small optimizations are needed to make it work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7678) Update Yauaa dependency

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7678:

Labels: ready-to-commit  (was: )

> Update Yauaa dependency
> ---
>
> Key: DRILL-7678
> URL: https://issues.apache.org/jira/browse/DRILL-7678
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The Yauaa useragent parsing library has a new release.
> Also a few changes and small optimizations are needed to make it work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7678) Update Yauaa dependency

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7678:

Fix Version/s: 1.18.0

> Update Yauaa dependency
> ---
>
> Key: DRILL-7678
> URL: https://issues.apache.org/jira/browse/DRILL-7678
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Priority: Major
> Fix For: 1.18.0
>
>
> The Yauaa useragent parsing library has a new release.
> Also a few changes and small optimizations are needed to make it work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7678) Update Yauaa dependency

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7678:

Reviewer: Charles Givre

> Update Yauaa dependency
> ---
>
> Key: DRILL-7678
> URL: https://issues.apache.org/jira/browse/DRILL-7678
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The Yauaa useragent parsing library has a new release.
> Also a few changes and small optimizations are needed to make it work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7678) Update Yauaa dependency

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7678:

Affects Version/s: 1.17.0

> Update Yauaa dependency
> ---
>
> Key: DRILL-7678
> URL: https://issues.apache.org/jira/browse/DRILL-7678
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Niels Basjes
>Priority: Major
>
> The Yauaa useragent parsing library has a new release.
> Also a few changes and small optimizations are needed to make it work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7675) Very slow performance and Memory exhaustion while querying on very small dataset of parquet files

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7675:

Fix Version/s: 1.18.0
 Reviewer: Arina Ielchiieva

> Very slow performance and Memory exhaustion while querying on very small 
> dataset of parquet files
> -
>
> Key: DRILL-7675
> URL: https://issues.apache.org/jira/browse/DRILL-7675
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.18.0
> Environment: [^sample-dataset.zip]
>Reporter: Idan Sheinberg
>Assignee: Paul Rogers
>Priority: Critical
> Fix For: 1.18.0
>
> Attachments: sample-dataset.zip
>
>
> Per our discussion in Slack/Dev-list Here are all details and sample data-set 
> to recreate problematic query behavior:
>  * We are using Drill 1.18.0-SNAPSHOT built on March 6
>  * We are joining on two small Parquet datasets residing on S3 using the 
> following query:
> {code:java}
> SELECT 
>  CASE
>  WHEN tbl1.`timestamp` IS NULL THEN tbl2.`timestamp`
>  ELSE tbl1.`timestamp`
>  END AS ts, *
>  FROM `s3-store.state.`/164` AS tbl1
>  FULL OUTER JOIN `s3-store.result`.`/164` AS tbl2
>  ON tbl1.`timestamp`*10 = tbl2.`timestamp`
>  ORDER BY ts ASC
>  LIMIT 500 OFFSET 0 ROWS
> {code}
>  * We are running drill in a single node setup on a 16 core, 64GB ram 
> machine. Drill heap size is set to 16GB, while max direct memory is set to 
> 32GB.
>  * As the dataset consist of really small files, Drill has been tweaked to 
> parallelize on small item count by tweaking the following variables:
> {code:java}
> planner.slice_target = 25
> planner.width.max_per_node = 16 (to match the core count){code}
>  * Without the above parallelization, query speeds on parquet files are super 
> slow (tens of seconds)
>  * While queries do work, we are seeing non-proportional direct memory/heap 
> utilization. (up 20GB of direct memory used, a min of 12GB heap required)
>  * We're still encountering the occasional OOM of memory error (we're also 
> seeing heap exhaustion, but I guess that's another indication to same 
> problem. Reducing the node parallelization width to say, 8, reduces memory 
> contention, though it still reaches 8 gb of direct memory 
> {code:java}
> User Error Occurred: One or more nodes ran out of memory while executing the 
> query. (null)
>  org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or 
> more nodes ran out of memory while executing the query.null[Error Id: 
> 67b61fc9-320f-47a1-8718-813843a10ecc ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
>  at 
> org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
>  at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
>  ... 4 common frames omitted{code}
> I've attached a (real!) sample data-set to match the query above. That same 
> datase

[jira] [Updated] (DRILL-7429) Wrong column order when selecting complex data using Hive storage plugin.

2020-04-05 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7429:

Reviewer: Vova Vysotskyi

> Wrong column order when selecting complex data using Hive storage plugin.
> -
>
> Key: DRILL-7429
> URL: https://issues.apache.org/jira/browse/DRILL-7429
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: customer_complex.zip
>
>
> *Data:*
> customer_complex.zip attached
> *Query:*
> {code:sql}
> select t3.a, t3.b from (select t2.a, t2.a.o_lineitems[1].l_part.p_name b from 
> (select t1.c_orders[0] a from hive.customer_complex t1) t2) t3 limit 1
> {code}
> *Expected result:*
> Column order: a, b
> *Actual result:*
> Column order: b, a
> *Physical plan:*
> {noformat}
> 00-00Screen
> 00-01  Project(a=[ROW($0, $1, $2, $3, $4, $5, $6, $7)], b=[$8])
> 00-02Project(a=[ITEM($0, 0).o_orderstatus], a1=[ITEM($0, 
> 0).o_totalprice], a2=[ITEM($0, 0).o_orderdate], a3=[ITEM($0, 
> 0).o_orderpriority], a4=[ITEM($0, 0).o_clerk], a5=[ITEM($0, 
> 0).o_shippriority], a6=[ITEM($0, 0).o_comment], a7=[ITEM($0, 0).o_lineitems], 
> b=[ITEM(ITEM(ITEM(ITEM($0, 0).o_lineitems, 1), 'l_part'), 'p_name')])
> 00-03  Project(c_orders=[$0])
> 00-04SelectionVectorRemover
> 00-05  Limit(fetch=[10])
> 00-06Scan(table=[[hive, customer_complex]], 
> groupscan=[HiveDrillNativeParquetScan [entries=[ReadEntryWithPath 
> [path=/drill/customer_complex/00_0]], numFiles=1, numRowGroups=1, 
> columns=[`c_orders`[0].`o_orderstatus`, `c_orders`[0].`o_totalprice`, 
> `c_orders`[0].`o_orderdate`, `c_orders`[0].`o_orderpriority`, 
> `c_orders`[0].`o_clerk`, `c_orders`[0].`o_shippriority`, 
> `c_orders`[0].`o_comment`, `c_orders`[0].`o_lineitems`, 
> `c_orders`[0].`o_lineitems`[1].`l_part`.`p_name`]]])
> {noformat}
> *Note:* Reproduced with both Hive and Native readers. Non-reproducible with 
> Parquet reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7680) Move UDF projects before plugins in contrib

2020-04-01 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7680:

Reviewer: Arina Ielchiieva

> Move UDF projects before plugins in contrib
> ---
>
> Key: DRILL-7680
> URL: https://issues.apache.org/jira/browse/DRILL-7680
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Several {{contrib}} plugins depend on UDFs for testing. However, the UDFs 
> occur after the plugins in build order. This PR reverses the dependencies so 
> that UDFs are built before the plguins that want to use them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7683) Add "message parsing" to new JSON loader

2020-04-01 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7683:

Reviewer: Arina Ielchiieva

> Add "message parsing" to new JSON loader
> 
>
> Key: DRILL-7683
> URL: https://issues.apache.org/jira/browse/DRILL-7683
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Worked on a project that uses the new JSON loader to parse a REST response 
> that includes a set of "wrapper" fields around the JSON payload. Example:
> {code:json}
> { "status": "ok", "results: [ data here ]}
> {code}
> To solve this cleanly, added the ability to specify a "message parser" to 
> consume JSON tokens up to the start of the data. This parser can be written 
> as needed for each different data source.
> Since this change adds one more parameter to the JSON structure parser, added 
> builders to gather the needed parameters rather than making the constructor 
> even larger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7683) Add "message parsing" to new JSON loader

2020-04-01 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7683:

Labels: ready-to-commit  (was: )

> Add "message parsing" to new JSON loader
> 
>
> Key: DRILL-7683
> URL: https://issues.apache.org/jira/browse/DRILL-7683
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Worked on a project that uses the new JSON loader to parse a REST response 
> that includes a set of "wrapper" fields around the JSON payload. Example:
> {code:json}
> { "status": "ok", "results: [ data here ]}
> {code}
> To solve this cleanly, added the ability to specify a "message parser" to 
> consume JSON tokens up to the start of the data. This parser can be written 
> as needed for each different data source.
> Since this change adds one more parameter to the JSON structure parser, added 
> builders to gather the needed parameters rather than making the constructor 
> even larger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7680) Move UDF projects before plugins in contrib

2020-04-01 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7680:

Labels: ready-to-commit  (was: )

> Move UDF projects before plugins in contrib
> ---
>
> Key: DRILL-7680
> URL: https://issues.apache.org/jira/browse/DRILL-7680
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Several {{contrib}} plugins depend on UDFs for testing. However, the UDFs 
> occur after the plugins in build order. This PR reverses the dependencies so 
> that UDFs are built before the plguins that want to use them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7673) View set query fails with NPE for non-existing option

2020-03-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7673:

Reviewer: Paul Rogers

> View set query fails with NPE for non-existing option
> -
>
> Key: DRILL-7673
> URL: https://issues.apache.org/jira/browse/DRILL-7673
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The following query fails with NPE:
> {code:sql}
> set `non-existing option`
> {code}
> Stack trace:
> {noformat}
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.planner.sql.handlers.SetOptionHandler.getPlan(SetOptionHandler.java:66)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:592)
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:277)
>   ... 3 common frames omitted
> {noformat}
> Also, the result of the command returns option name with the same case as it 
> was specified in the query:
> {noformat}
> set `metastore.ENABLED`;
> +---+---+
> |   name| value |
> +---+---+
> | metastore.ENABLED | false |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7673) View set query fails with NPE for non-existing option

2020-03-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7673:

Labels: ready-to-commit  (was: )

> View set query fails with NPE for non-existing option
> -
>
> Key: DRILL-7673
> URL: https://issues.apache.org/jira/browse/DRILL-7673
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The following query fails with NPE:
> {code:sql}
> set `non-existing option`
> {code}
> Stack trace:
> {noformat}
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.planner.sql.handlers.SetOptionHandler.getPlan(SetOptionHandler.java:66)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:592)
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:277)
>   ... 3 common frames omitted
> {noformat}
> Also, the result of the command returns option name with the same case as it 
> was specified in the query:
> {noformat}
> set `metastore.ENABLED`;
> +---+---+
> |   name| value |
> +---+---+
> | metastore.ENABLED | false |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps

2020-03-27 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7668:

Reviewer: Paul Rogers

> Allow Time Bucket Function to Accept Floats and Timestamps
> --
>
> Key: DRILL-7668
> URL: https://issues.apache.org/jira/browse/DRILL-7668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Drill has a function `time_bucket()` which facilitates time series analysis. 
> This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input.  
> Floats are typically not used for timestamps, however in the event that the 
> data is coming from imperfect files, the numbers may be read as floats and 
> hence require casting in queries.  This PR makes this easier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps

2020-03-27 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7668:

Labels: ready-to-commit  (was: )

> Allow Time Bucket Function to Accept Floats and Timestamps
> --
>
> Key: DRILL-7668
> URL: https://issues.apache.org/jira/browse/DRILL-7668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Drill has a function `time_bucket()` which facilitates time series analysis. 
> This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input.  
> Floats are typically not used for timestamps, however in the event that the 
> data is coming from imperfect files, the numbers may be read as floats and 
> hence require casting in queries.  This PR makes this easier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7641) Convert Excel Reader to Use Streaming Reader

2020-03-27 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7641:

Labels: ready-to-commit  (was: )

> Convert Excel Reader to Use Streaming Reader
> 
>
> Key: DRILL-7641
> URL: https://issues.apache.org/jira/browse/DRILL-7641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The current implementation of the Excel reader uses the Apache POI reader, 
> which uses excessive amounts of memory. As a result, attempting to read large 
> Excel files will cause out of memory errors. 
> This PR converts the format plugin to use a streaming reader, based still on 
> the POI library.  The documentation for the streaming reader can be found 
> here. [1]
> All unit tests pass and I tested the plugin with some large Excel files on my 
> computer.
> [1]: [https://github.com/pjfanning/excel-streaming-reader]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7672) Make metadata type required when reading from / writing into Drill Metastore

2020-03-27 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7672:

Reviewer: Vova Vysotskyi

> Make metadata type required when reading from / writing into Drill Metastore
> 
>
> Key: DRILL-7672
> URL: https://issues.apache.org/jira/browse/DRILL-7672
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Metastore consists of components: TABLES, VIEWS etc (so far only TABLES are 
> implemented). Each component metadata can have types. For examples, TABLES 
> metadata can be of the following types: TABLE, SEGMENT, FILE, ROW_GROUP, 
> PARTITION.
> During initial Metastore implementation when reading from / writing into 
> Metastore, metadata type was indicated in filter expressions. 
> For Iceberg Metastore where all data is stored in files this was not this 
> critical, basically when information is retrieved about the table, table 
> folder is queried.
> For other Metastore implementations knowing metadata type can be more 
> critical. For example, RDBMS Metastore would store TABLES metadata in 
> different tables thus knowing which table to query would improve performance 
> rather than trying to query all tables.
> Of course, we could traverse query filter and look for the hints which 
> metadata type is needed but it is much better to know required metadata type 
> beforehand without any extra logic.
> Taking into account that Metastore metadata is queried only in Drill code, 
> developer knows beforehand what he needs to get / update / delete.
> This Jira aims to make metadata type required when reading from / writing 
> into Drill Metastore. This change does not have any affect on the users, just 
> internal code refactoring.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7672) Make metadata type required when reading from / writing into Drill Metastore

2020-03-27 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7672:
---

 Summary: Make metadata type required when reading from / writing 
into Drill Metastore
 Key: DRILL-7672
 URL: https://issues.apache.org/jira/browse/DRILL-7672
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.18.0


Metastore consists of components: TABLES, VIEWS etc (so far only TABLES are 
implemented). Each component metadata can have types. For examples, TABLES 
metadata can be of the following types: TABLE, SEGMENT, FILE, ROW_GROUP, 
PARTITION.
During initial Metastore implementation when reading from / writing into 
Metastore, metadata type was indicated in filter expressions. 
For Iceberg Metastore where all data is stored in files this was not this 
critical, basically when information is retrieved about the table, table folder 
is queried.
For other Metastore implementations knowing metadata type can be more critical. 
For example, RDBMS Metastore would store TABLES metadata in different tables 
thus knowing which table to query would improve performance rather than trying 
to query all tables.
Of course, we could traverse query filter and look for the hints which metadata 
type is needed but it is much better to know required metadata type beforehand 
without any extra logic.
Taking into account that Metastore metadata is queried only in Drill code, 
developer knows beforehand what he needs to get / update / delete.

This Jira aims to make metadata type required when reading from / writing into 
Drill Metastore. This change does not have any affect on the users, just 
internal code refactoring.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7641) Convert Excel Reader to Use Streaming Reader

2020-03-27 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7641:

Reviewer: Arina Ielchiieva

> Convert Excel Reader to Use Streaming Reader
> 
>
> Key: DRILL-7641
> URL: https://issues.apache.org/jira/browse/DRILL-7641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> The current implementation of the Excel reader uses the Apache POI reader, 
> which uses excessive amounts of memory. As a result, attempting to read large 
> Excel files will cause out of memory errors. 
> This PR converts the format plugin to use a streaming reader, based still on 
> the POI library.  The documentation for the streaming reader can be found 
> here. [1]
> All unit tests pass and I tested the plugin with some large Excel files on my 
> computer.
> [1]: [https://github.com/pjfanning/excel-streaming-reader]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7665) Add UNION to schema parser

2020-03-26 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7665:

Reviewer: Vova Vysotskyi

> Add UNION to schema parser
> --
>
> Key: DRILL-7665
> URL: https://issues.apache.org/jira/browse/DRILL-7665
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> After DRILL-7633 has defined proper type string for UNION it should be added 
> to schema parser to allow proper ser / de.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7667) Invalid picture

2020-03-26 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067627#comment-17067627
 ] 

Arina Ielchiieva commented on DRILL-7667:
-

[~扎啤] contributions are always welcome, it would be nice if you open the PR 
with the fix.

> Invalid picture
> ---
>
> Key: DRILL-7667
> URL: https://issues.apache.org/jira/browse/DRILL-7667
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.17.0
>Reporter: Aaron-Mhs
>Priority: Major
> Attachments: image-2020-03-26-10-57-51-421.png, 
> image-2020-03-26-10-58-26-273.png, image-2020-03-26-10-59-49-842.png, 
> image-2020-03-26-11-00-31-515.png
>
>
> There is an invalid flowchart for the [Configuring Kerberos Security] block 
> of Drill.
> !image-2020-03-26-10-57-51-421.png!
> When I clone [drill-gh-pages] branch, I found the inside of the picture open 
> to report an error, the rest is normal.
> !image-2020-03-26-10-58-26-273.png!
> !image-2020-03-26-10-59-49-842.png!
>   !image-2020-03-26-11-00-31-515.png!
> Please repair it as soon as possible. tks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7665) Add UNION to schema parser

2020-03-25 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7665:
---

 Summary: Add UNION to schema parser
 Key: DRILL-7665
 URL: https://issues.apache.org/jira/browse/DRILL-7665
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Arina Ielchiieva
 Fix For: 1.18.0


After DRILL-7633 has defined proper type string for UNION it should be added to 
schema parser to allow proper ser / de.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7665) Add UNION to schema parser

2020-03-25 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7665:
---

Assignee: Arina Ielchiieva

> Add UNION to schema parser
> --
>
> Key: DRILL-7665
> URL: https://issues.apache.org/jira/browse/DRILL-7665
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> After DRILL-7633 has defined proper type string for UNION it should be added 
> to schema parser to allow proper ser / de.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7663) Code refactor to DefaultFunctionResolver

2020-03-25 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7663:

Labels: ready-to-commit  (was: )

> Code refactor to DefaultFunctionResolver
> 
>
> Key: DRILL-7663
> URL: https://issues.apache.org/jira/browse/DRILL-7663
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> A small code refactor to DefaultFunctionResolver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7633) Fixes for union and repeated list accessors

2020-03-25 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7633:

Labels: ready-to-commit  (was: )

> Fixes for union and repeated list accessors
> ---
>
> Key: DRILL-7633
> URL: https://issues.apache.org/jira/browse/DRILL-7633
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Minor fixes for repeated list and Union type support in column accessors



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7663) Code refactor to DefaultFunctionResolver

2020-03-25 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7663:

Affects Version/s: 1.17.0

> Code refactor to DefaultFunctionResolver
> 
>
> Key: DRILL-7663
> URL: https://issues.apache.org/jira/browse/DRILL-7663
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Minor
> Fix For: 1.18.0
>
>
> A small code refactor to DefaultFunctionResolver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7663) Code refactor to DefaultFunctionResolver

2020-03-25 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7663:

Reviewer: Arina Ielchiieva

> Code refactor to DefaultFunctionResolver
> 
>
> Key: DRILL-7663
> URL: https://issues.apache.org/jira/browse/DRILL-7663
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Minor
> Fix For: 1.18.0
>
>
> A small code refactor to DefaultFunctionResolver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7663) Code refactor to DefaultFunctionResolver

2020-03-25 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7663:

Issue Type: Improvement  (was: New Feature)

> Code refactor to DefaultFunctionResolver
> 
>
> Key: DRILL-7663
> URL: https://issues.apache.org/jira/browse/DRILL-7663
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Minor
> Fix For: 1.18.0
>
>
> A small code refactor to DefaultFunctionResolver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7641) Convert Excel Reader to Use Streaming Reader

2020-03-24 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066212#comment-17066212
 ] 

Arina Ielchiieva commented on DRILL-7641:
-

[~cgivre] sorry made this by mistake.

> Convert Excel Reader to Use Streaming Reader
> 
>
> Key: DRILL-7641
> URL: https://issues.apache.org/jira/browse/DRILL-7641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> The current implementation of the Excel reader uses the Apache POI reader, 
> which uses excessive amounts of memory. As a result, attempting to read large 
> Excel files will cause out of memory errors. 
> This PR converts the format plugin to use a streaming reader, based still on 
> the POI library.  The documentation for the streaming reader can be found 
> here. [1]
> All unit tests pass and I tested the plugin with some large Excel files on my 
> computer.
> [1]: [https://github.com/pjfanning/excel-streaming-reader]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7641) Convert Excel Reader to Use Streaming Reader

2020-03-24 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7641:

Fix Version/s: 1.18.0

> Convert Excel Reader to Use Streaming Reader
> 
>
> Key: DRILL-7641
> URL: https://issues.apache.org/jira/browse/DRILL-7641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> The current implementation of the Excel reader uses the Apache POI reader, 
> which uses excessive amounts of memory. As a result, attempting to read large 
> Excel files will cause out of memory errors. 
> This PR converts the format plugin to use a streaming reader, based still on 
> the POI library.  The documentation for the streaming reader can be found 
> here. [1]
> All unit tests pass and I tested the plugin with some large Excel files on my 
> computer.
> [1]: [https://github.com/pjfanning/excel-streaming-reader]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-6604) Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6604:

Reviewer: Igor Guzenko

> Upgrade Drill Hive client to Hive3.1 version
> 
>
> Key: DRILL-6604
> URL: https://issues.apache.org/jira/browse/DRILL-6604
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Recently the Hive3.1 version is released.
>  3.1 versions of {{hive-exec}} and {{hive-metastore}} (main Drill Hive client 
> libraries) are the last available artifacts in maven repo for now:
>  [https://mvnrepository.com/artifact/org.apache.hive/hive-exec/3.1.0]
>  [https://mvnrepository.com/artifact/org.apache.hive/hive-metastore/3.1.0]
> It is necessary to investigate what impact of update onto the newer major 
> Hive client version.
> The initial conflicts after changing the {{hive.version}} in Drill root pom 
> file from 2.3.2 to 3.1.0 are:
> {code}
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java:[58,39]
>  error: cannot find symbol
> [ERROR]   symbol:   class MetaStoreUtils
>   location: package org.apache.hadoop.hive.metastore
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java:[34,39]
>  error: cannot find symbol
> [INFO] 2 errors
> {code}
> {code}
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java:[575,15]
>  error: cannot find symbol
> [ERROR]   symbol:   method setTransactionalTableScan(JobConf,boolean)
>   location: class AcidUtils
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveFieldConverter.java:[216,92]
>  error: incompatible types: org.apache.hadoop.hive.common.type.Timestamp 
> cannot be converted to java.sql.Timestamp
> [ERROR] 
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveFieldConverter.java:[225,82]
>  error: incompatible types: org.apache.hadoop.hive.common.type.Date cannot be 
> converted to java.sql.Date
> [ERROR] 
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java:[100,17]
>  error: cannot find symbol
> [ERROR]   symbol:   method setTokenStr(UserGroupInformation,String,String)
>   location: class Utils
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/target/generated-sources/org/apache/drill/exec/expr/fn/impl/hive/DrillTimeStampTimestampObjectInspector.java:[29,16]
>  error: Required is not abstract and does not override abstract method 
> getPrimitiveJavaObject(Object) in TimestampObjectInspector
> [ERROR] 
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/target/generated-sources/org/apache/drill/exec/expr/fn/impl/hive/DrillTimeStampTimestampObjectInspector.java:[35,30]
>  error: getPrimitiveJavaObject(Object) in Required cannot implement 
> getPrimitiveJavaObject(Object) in TimestampObjectInspector
> [ERROR]   return type java.sql.Timestamp is not compatible with 
> org.apache.hadoop.hive.common.type.Timestamp
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/target/generated-sources/org/apache/drill/exec/expr/fn/impl/hive/DrillTimeStampTimestampObjectInspector.java:[44,29]
>  error: getPrimitiveWritableObject(Object) in Required cannot implement 
> getPrimitiveWritableObject(Object) in TimestampObjectInspector
> [ERROR]   return type TimestampWritable is not compatible with 
> TimestampWritableV2
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/target/generated-sources/org/apache/drill/exec/expr/fn/impl/hive/DrillTimeStampTimestampObjectInspector.java:[54,16]
>  error: Optional is not abstract and does not override abstract method 
> getPrimitiveJavaObject(Object) in TimestampObjectInspector
> [ERROR] 
> /home/vitalii/IdeaProjects/drill-fork/contrib/storage-hive/core/target/generated-sources/org/apache/drill/exec/expr/fn/impl/hive/DrillTimeStampTimestampObjectInspector.java:[60,30]
>  error: getPrimitiveJavaObject(Object) in Optional cannot implement 
> getPrimitiveJavaObject(Object) in TimestampObjectInspector
> [ERROR]   return type java.sql.Timestamp is not compatible with 
> org.apache.hadoop.hive.common.type.Timestamp
> /home/vitalii/IdeaProjects/d

[jira] [Updated] (DRILL-7648) Scrypt j_security_check works without security headers

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7648:

Reviewer: Vova Vysotskyi

> Scrypt j_security_check works without security headers 
> ---
>
> Key: DRILL-7648
> URL: https://issues.apache.org/jira/browse/DRILL-7648
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Dmytro Kondriukov
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.18.0
>
>
> *Preconditions:*
> drill-override.conf
> {noformat}
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:5181"
>   impersonation: {
> enabled: true,
> max_chained_user_hops: 3
> },
> security: {
> auth.mechanisms : ["PLAIN"],
> },
> security.user.auth: {
> enabled: true,
> packages += "org.apache.drill.exec.rpc.user.security",
> impl: "pam4j",
> pam_profiles: [ "sudo", "login" ]
> }
>   http: {
> ssl_enabled: true,.
> jetty.server.response.headers: {
>   "X-XSS-Protection": "1; mode=block",
>   "X-Content-Type-Options": "nosniff",
>   "Strict-Transport-Security": "max-age=31536000;includeSubDomains",
>   "Content-Security-Policy": "default-src https:; script-src 
> 'unsafe-inline' https:; style-src 'unsafe-inline' https:; font-src data: 
> https:; img-src data: https:"
> }
>   }
> }
> {noformat}
> *Steps:*
> 1. Perform login to drillbit webUI
> 2. Check in browser console in tab "network" headers of resource 
> https://node1.cluster.com:8047/j_security_check
> 3. Check section "response headers"
> *Expected result:* security headers are present
> *Actual result:* security headers are absent
> 4. Check section "Form Data"
> *Expected result:* parameter "j_password" content is hidden
> *Actual result:* parameter "j_password" content is visible



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7641) Convert Excel Reader to Use Streaming Reader

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7641:

Fix Version/s: (was: 1.18.0)

> Convert Excel Reader to Use Streaming Reader
> 
>
> Key: DRILL-7641
> URL: https://issues.apache.org/jira/browse/DRILL-7641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>
> The current implementation of the Excel reader uses the Apache POI reader, 
> which uses excessive amounts of memory. As a result, attempting to read large 
> Excel files will cause out of memory errors. 
> This PR converts the format plugin to use a streaming reader, based still on 
> the POI library.  The documentation for the streaming reader can be found 
> here. [1]
> All unit tests pass and I tested the plugin with some large Excel files on my 
> computer.
> [1]: [https://github.com/pjfanning/excel-streaming-reader]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-6758) Hash Join should not return the join columns when they are not needed downstream

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6758:

Fix Version/s: (was: 1.18.0)

> Hash Join should not return the join columns when they are not needed 
> downstream
> 
>
> Key: DRILL-6758
> URL: https://issues.apache.org/jira/browse/DRILL-6758
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Minor
>
> Currently the Hash-Join operator returns all its (both sides) incoming 
> columns. In cases where the join columns are not used further downstream, 
> this is a waste (allocating vectors, copying each value, etc).
>   Suggestion: Have the planner pass this information to the Hash-Join 
> operator, to enable skipping the return of these columns.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-4667) Improve memory footprint of broadcast joins

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4667:

Fix Version/s: (was: 1.18.0)

> Improve memory footprint of broadcast joins
> ---
>
> Key: DRILL-4667
> URL: https://issues.apache.org/jira/browse/DRILL-4667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Boaz Ben-Zvi
>Priority: Major
>
> For broadcast joins, currently Drill optimizes the data transfer across the 
> network for broadcast table by sending a single copy to the receiving node 
> which then distributes it to all minor fragments running on that particular 
> node.  However, each minor fragment builds its own hash table (for a hash 
> join) using this broadcast table.  We can substantially improve the memory 
> footprint by having a shared copy of the hash table among multiple minor 
> fragments on a node.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-5102) Use a different root path for Dynamic UDF directories in local vs. DFS modes

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5102:
---

Assignee: (was: Arina Ielchiieva)

> Use a different root path for Dynamic UDF directories in local vs. DFS modes
> 
>
> Key: DRILL-5102
> URL: https://issues.apache.org/jira/browse/DRILL-5102
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Priority: Minor
>
> A user on the Drill mail list tried to start an embedded Drillbit on Windows 
> and got the error shown below.
> The problem is that the Dynamic UDF feature has a "root" option which 
> defaults to "User/". On Windows, "User" is a special folder 
> with limited permissions. On Windows, it is uncommon to create directories 
> directly under the User's home folder. Instead, files are usually put under 
> Documents, AppData, etc.
> On the Mac, "/Users/" is the user's home directory, another 
> awkward place to put app-specific files.
> On Linux, the "/Users" folder probably won't exist and the user probably 
> won't have permission to create a new top-level folder.
> However, on HDFS, "/user/" is the typical pattern.
> The Dynamic UDF feature needs a way to calculate a good default root suitable 
> for the type of file system. For "file:///", it should be in a Drill temp 
> directory (perhaps along side the storage plugins in /tmp/drill." On HDFS, it 
> should default to "/user/drill" or "/user/". On Windows... Not 
> sure where Drill puts its embedded metadata files, but the default Dynamic 
> UDF location should be the same.
> {code}
> Error: Failure in starting embedded Drillbit: 
> org.apache.drill.common.exceptions
> .DrillRuntimeException: Error during udf area creation 
> [/C:/Users/ivy.chan/drill
> /udf/registry] on file system [file:///] (state=,code=0)
> java.sql.SQLException: Failure in starting embedded Drillbit: 
> org.apache.drill.c
> ommon.exceptions.DrillRuntimeException: Error during udf area creation 
> [/C:/User
> s/ivy.chan/drill/udf/registry] on file system [file:///]
>at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnection
> Impl.java:128)
>at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(Dril
> lJdbc41Factory.java:70)
>at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.ja
> va:69)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6219) Filter pushdown doesn't work with star operator if there is a subquery with it's own filter

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6219:
---

Assignee: (was: Arina Ielchiieva)

> Filter pushdown doesn't work with star operator if there is a subquery with 
> it's own filter
> ---
>
> Key: DRILL-6219
> URL: https://issues.apache.org/jira/browse/DRILL-6219
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Priority: Major
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> select * from (select * from 
> dfs.drillTestDir.`DRILL_6118_parquet_partitioned_by_folders` where c1>2) 
> where c1>3{code}
> *Expected result:*
> Filrers "c1>3" and "c1>2" should both be pushed down so only the data from 
> the folder "d3" should be scanned.
> *Actual result:* 
> The data from the folders "d1" and  "d3" are being scanned so as only filter 
> "c1>2" is pushed down
> *Physical plan:*
> {noformat}
> 00-00Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 10.0, 
> cumulative cost = {201.0 rows, 581.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 105545
> 00-01  Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount 
> = 10.0, cumulative cost = {200.0 rows, 580.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 105544
> 00-02SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR 
> T25¦¦**): rowcount = 10.0, cumulative cost = {190.0 rows, 570.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 105543
> 00-03  Filter(condition=[>(ITEM($0, 'c1'), 3)]) : rowType = 
> RecordType(DYNAMIC_STAR T25¦¦**): rowcount = 10.0, cumulative cost = {180.0 
> rows, 560.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105542
> 00-04Project(T25¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR 
> T25¦¦**): rowcount = 20.0, cumulative cost = {160.0 rows, 440.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 105541
> 00-05  SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR 
> T25¦¦**, ANY c1): rowcount = 20.0, cumulative cost = {140.0 rows, 420.0 cpu, 
> 0.0 io, 0.0 network, 0.0 memory}, id = 105540
> 00-06Filter(condition=[>($1, 2)]) : rowType = 
> RecordType(DYNAMIC_STAR T25¦¦**, ANY c1): rowcount = 20.0, cumulative cost = 
> {120.0 rows, 400.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105539
> 00-07  Project(T25¦¦**=[$0], c1=[$1]) : rowType = 
> RecordType(DYNAMIC_STAR T25¦¦**, ANY c1): rowcount = 40.0, cumulative cost = 
> {80.0 rows, 160.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105538
> 00-08Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/drill/testdata/DRILL_6118_parquet_partitioned_by_folders/d1/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=/drill/testdata/DRILL_6118_parquet_partitioned_by_folders/d3/0_0_0.parquet]],
>  
> selectionRoot=maprfs:/drill/testdata/DRILL_6118_parquet_partitioned_by_folders,
>  numFiles=2, numRowGroups=2, usedMetadataFile=false, columns=[`**`]]]) : 
> rowType = RecordType(DYNAMIC_STAR **, ANY c1): rowcount = 40.0, cumulative 
> cost = {40.0 rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105537
> {noformat}
> *Note:* Works fine if select column names instead of "*"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-5101) Provide boot-time option to disable the Dynamic UDF feature

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5101:
---

Assignee: (was: Arina Ielchiieva)

> Provide boot-time option to disable the Dynamic UDF feature
> ---
>
> Key: DRILL-5101
> URL: https://issues.apache.org/jira/browse/DRILL-5101
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Priority: Minor
>
> A Windows user on the mailing list could not start an embedded Drillbit 
> because the Dynamic UDF feature tried to create a directory on the user's 
> protected Users folder:
> {code}
> Error: Failure in starting embedded Drillbit: 
> org.apache.drill.common.exceptions
> .DrillRuntimeException: Error during udf area creation 
> [/C:/Users/ivy.chan/drill
> /udf/registry] on file system [file:///] (state=,code=0)
> java.sql.SQLException: Failure in starting embedded Drillbit: 
> org.apache.drill.c
> ommon.exceptions.DrillRuntimeException: Error during udf area creation 
> [/C:/User
> s/ivy.chan/drill/udf/registry] on file system [file:///]
>at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnection
> Impl.java:128)
>at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(Dril
> lJdbc41Factory.java:70)
>at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.ja
> va:69)
> {code}
> The fastest workaround (since this was an embedded Drillbit) would be to 
> disable the Dynamic UDF feature. Unfortunately, the only option to do so is a 
> runtime option that requires that the Drillbit be started. That creates a 
> vicious circle: we can't start the Drillbit unless we disable Dynamic UDFs, 
> but we can't disable them unless we start the Drillbit.
> The workaround might be to change the root directory, which is why this bug 
> is marked minor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6448) query on nested data returns bogus results

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6448:
---

Assignee: (was: Arina Ielchiieva)

> query on nested data returns bogus results
> --
>
> Key: DRILL-6448
> URL: https://issues.apache.org/jira/browse/DRILL-6448
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.13.0
>Reporter: Dechang Gu
>Priority: Major
>
> I created a nested parquet table nested_c_o_l with the following schema:
> {code}
> message root {
>   optional binary c_custkey (UTF8);
>   optional binary c_name (UTF8);
>   optional binary c_address (UTF8);
>   optional int64 c_nationkey;
>   optional binary c_phone (UTF8);
>   optional double c_acctbal;
>   optional binary c_mktsegment (UTF8);
>   optional binary c_comment (UTF8);
>   repeated group c_orders {
> optional int64 o_orderkey;
> optional binary o_orderstatus (UTF8);
> optional double o_totalprice;
> optional binary o_orderdate (UTF8);
> optional binary o_orderpriority (UTF8);
> optional binary o_clerk (UTF8);
> optional int64 o_shippriority;
> optional binary o_comment (UTF8);
> repeated group o_lineitems {
>   optional int64 l_partkey;
>   optional int64 l_suppkey;
>   optional int64 l_linenumber;
>   optional int64 l_quantity;
>   optional double l_extendedprice;
>   optional double l_discount;
>   optional double l_tax;
>   optional binary l_returnflag (UTF8);
>   optional binary l_linestatus (UTF8);
>   optional binary l_shipdate (UTF8);
>   optional binary l_commitdate (UTF8);
>   optional binary l_receiptdate (UTF8);
>   optional binary l_shipinstruct (UTF8);
>   optional binary l_shipmode (UTF8);
>   optional binary l_comment (UTF8);
> }
>   }
> }
> {code}
> and run the following query:
> {code}
> select * from nested_c_o_l c where c.c_orders.o_orderdate='1997-06-23';
> {code}
> It returns two row, with the 2nd row does not satisfy the filter condition:
> {code}0: jdbc:drill:schema=dfs.tpcdsView> select * from nested_c_o_l c where 
> c.c_orders.o_orderdate='1997-06-23';
> +---++---+-+-+---+--+---+--+
> | c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | 
> c_mktsegment | c_comment | c_orders |
> +---++---+-+-+---+--+---+--+
> | 1 | Customer#1 | IVhzIApeRb ot,c,E | 15 | 25-989-741-2988 | 711.56 
> | BUILDING | to the even, regular platelets. regular, ironic epitaphs nag e | 
> [{"o_orderkey":9154,"o_orderstatus":"O","o_totalprice":357345.46,"o_orderdate":"1997-06-23","o_orderpriority":"4-NOT
>  SPECIFIED","o_clerk":"Clerk#00328","o_shippriority":0,"o_comment":"y 
> ironic packages cajole. blithely final 
> depende","o_lineitems":[{"l_partkey":866,"l_suppkey":100,"l_linenumber":1,"l_quantity":45,"l_extendedprice":79508.7,"l_discount":0.06,"l_tax":0.06,"l_returnflag":"N","l_linestatus":"O","l_shipdate":"1997-09-24","l_commitdate":"1997-08-11","l_receiptdate":"1997-10-14","l_shipinstruct":"NONE","l_shipmode":"FOB","l_comment":"nal,
>  careful instructions wake carefully. 
> b"},{"l_partkey":1735,"l_suppkey":36,"l_linenumber":7,"l_quantity":47,"l_extendedprice":76926.31,"l_discount":0.04,"l_tax":0.05,"l_returnflag":"N","l_linestatus":"O","l_shipdate":"1997-07-07","l_commitdate":"1997-09-07","l_receiptdate":"1997-07-25","l_shipinstruct":"DELIVER
>  IN PERSON","l_shipmode":"SHIP","l_comment":"wake boldly above the 
> furiousl"},{"l_partkey":1403,"l_suppkey":43,"l_linenumber":6,"l_quantity":40,"l_extendedprice":52176.0,"l_discount":0.03,"l_tax":0.02,"l_returnflag":"N","l_linestatus":"O","l_shipdate":"1997-06-24","l_commitdate":"1997-09-03","l_receiptdate":"1997-07-24","l_shipinstruct":"COLLECT
>  COD","l_shipmode":"TRUCK","l_comment":"t haggle 
> bli"},{"l_partkey":1770,"l_suppkey":13,"l_linenumber":5,"l_quantity":12,"l_extendedprice":20061.24,"l_discount":0.0,"l_tax":0.0,"l_returnflag":"N","l_linestatus":"O","l_shipdate":"1997-08-20","l_commitdate":"1997-07-26","l_receiptdate":"1997-09-17","l_shipinstruct":"NONE","l_shipmode":"RAIL","l_comment":"es.
>  requests print furiously instead of 
> th"},{"l_partkey":1967,"l_suppkey":100,"l_linenumber":4,"l_quantity":31,"l_extendedprice":57937.76,"l_discount":0.0,"l_tax":0.0,"l_returnflag":"N","l_linestatus":"O","l_shipdate":"1997-08-28","l_commitdate":"1997-07-29","l_receiptdate":"1997-09-07","l_shipinstruct":"NONE","l_shipmode":"AIR","l_comment":"final
>  warthogs. slyly pending 
> request"},{"l_partkey":534,"l_suppkey":65,"l_linenumber":3,"l_quantity":46,"l_extendedprice":65988.38,"l_discount":0.04,"l_tax":0.01,"l_returnf

[jira] [Assigned] (DRILL-4424) Metadata cache file emptied when the drillbit process does not have write permissions

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4424:
---

Assignee: (was: Arina Ielchiieva)

> Metadata cache file emptied when the drillbit process does not have write 
> permissions
> -
>
> Key: DRILL-4424
> URL: https://issues.apache.org/jira/browse/DRILL-4424
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.6.0
>Reporter: Rahul Kumar Challapalli
>Priority: Major
>
> Commit # : bb3fc15216d9cab804fc9a6f0e5bd34597dd4394
> Commit Date : Dec 7, 2015
> I have a directory "lineitem" on maprfs which contains a metadata cache file. 
> Both the folder and cache file are owned by root.
> Now I started drill as user "mapr" and ran "refresh table metadata." 
> command on "lineitem" and below is the result
> {code}
> 0: jdbc:drill:zk=10.10.100.183:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> ++-+
> | ok | summary |
> ++-+
> | false | Error: 2049.1804.436520 
> /drill/testdata/metadata_caching/lineitem/.drill.parquet_metadata (Permission 
> denied) |
> ++-+
> 1 row selected (0.403 seconds)
> {code}
> Two issues here :
> {code}
> 1. Running the above command actually empties the metadata cache file.
> 2. Without the cache file, the user "mapr" has permissions to read the data 
> in directory "lineitem". However when there is cache file in the directory 
> (owned by "root") I get back the above error. So the presence of the cache 
> file is changing the outcome of the query.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-2029) All readers should show the filename where it encountered error. If possible, also position

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-2029:
---

Assignee: (was: Bohdan Kazydub)

> All readers should show the filename where it encountered error.  If 
> possible, also position
> 
>
> Key: DRILL-2029
> URL: https://issues.apache.org/jira/browse/DRILL-2029
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Aman Sinha
>Priority: Major
> Fix For: Future
>
>
> The Parquet reader (and possibly other file system readers) may encounter an 
> error (e.g an IndexOutOfBounds)  while reading one out of hundreds or 
> thousands of files in a directory.  The stack does not show the exact file 
> where this error occurred which makes diagnosing the problem much harder.   
> We should show the filename in the error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6679) Error should be displayed when trying to connect Drill to unsupported version of Hive

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6679:
---

Assignee: (was: Bohdan Kazydub)

> Error should be displayed when trying to connect Drill to unsupported version 
> of Hive
> -
>
> Key: DRILL-6679
> URL: https://issues.apache.org/jira/browse/DRILL-6679
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Priority: Major
> Fix For: Future
>
>
> For example, there is no backward compatibility between Hive 2.3 and Hive 
> 2.1. But it is possible to connect Drill with Hive 2.3 client to Hive 2.1, it 
> just won't work correctly. So I suggest that enabling Hive storage plugin 
> should not be allowed if Hive version is unsupported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7091:
---

Assignee: (was: Bohdan Kazydub)

> Query with EXISTS and correlated subquery fails with NPE in 
> HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
> --
>
> Key: DRILL-7091
> URL: https://issues.apache.org/jira/browse/DRILL-7091
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Steps to reproduce:
> 1. Create view:
> {code:sql}
> create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
> {code}
> Run the following query:
> {code:sql}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey)
> {code}
> This query fails with NPE:
> {noformat}
> [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.im

[jira] [Assigned] (DRILL-5360) Timestamp type documented as UTC, implemented as local time

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5360:
---

Assignee: (was: Bohdan Kazydub)

> Timestamp type documented as UTC, implemented as local time
> ---
>
> Key: DRILL-5360
> URL: https://issues.apache.org/jira/browse/DRILL-5360
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Critical
> Fix For: 2.0.0
>
>
> The Drill documentation implies that the {{Timestamp}} type is in UTC:
> bq. JDBC timestamp in year, month, date hour, minute, second, and optional 
> milliseconds format: -MM-dd HH:mm:ss.SSS. ... TIMESTAMP literals: Drill 
> stores values in Coordinated Universal Time (UTC). Drill supports time 
> functions in the range 1971 to 2037. ... Drill does not support TIMESTAMP 
> with time zone.
> The above is ambiguous. The first part talks about JDBC timestamps. From the 
> JDK Javadoc:
> bq. Timestamp: A thin wrapper around java.util.Date. ... Date class is 
> intended to reflect coordinated universal time (UTC)...
> So, a JDBC timestamp is intended to represent time in UTC. (The "indented to 
> reflect" statement leaves open the possibility of misusing {{Date}} to 
> represent times in other time zones. This was common practice in early Java 
> development and was the reason for the eventual development of the Joda, then 
> Java 8 date/time classes.)
> The Drill documentation implies that timestamp *literals* are in UTC, but a 
> careful read of the documentation does allow an interpretation that the 
> internal representation can be other than UTC. If this is true, then we would 
> also rely on a liberal reading of the Java `Timestamp` class to also not be 
> UTC. (Or, we rely on the Drill JDBC driver to convert from the (unknown) 
> server time zone to a UTC value returned by the Drill JDBC client.)
> Still, a superficial reading (and common practice) would suggest that a Drill 
> Timestamp should be in UTC.
> However, a test on a Mac, with an embedded Drillbit (run in the Pacific time 
> zone, with Daylight Savings Time in effect) shows that the Timestamp binary 
> value is actual local time:
> {code}
>   long before = System.currentTimeMillis();
>   long value = getDateValue(client, "SELECT NOW() FROM (VALUES(1))" );
>   double hrsDiff = (value - before) / (1000.00 * 60 * 60);
>   System.out.println("Hours: " + hrsDiff);
> {code}
> The above gets the actual UTC time from Java. Then, it runs a query that gets 
> Drill's idea of the current time using the {{NOW()}} function. (The 
> {{getDateValue}} function uses the new test framework to access the actual 
> {{long}} value from the returned value vector.) Finally, we compute the 
> difference between the two times, converted to hours. Output:
> {code}
> Hours: -6.975
> {code}
> As it turns out, this is the difference between UTC and PDT. So, the time is 
> in local time, not UTC.
> Since the documentation and implementation are both ambiguous, it is hard to 
> know the intent of the Drill Timestamp. Clearly, common practice is to use 
> UTC. But, there is wiggle-room.
> If the Timestamp value is supposed to be local time, then Drill should 
> provide a function to return the server's time zone offset (in ms) from UTC 
> so that the client can to the needed local-to-UTC conversion to get a true 
> timestamp.
> On the other hand, if the Timestamp is supposed to be UTC (per common 
> practice), then {{NOW()}} should not report local time, it should return UTC.
> Further, if {{NOW()}} returns local time, but Timestamp literals are UTC, 
> then it is hard to see how any query can be rationally written if one 
> timestamp value is local, but a literal is UTC.
> So, job #1 is to define the Timestamp semantics. Then, use that to figure out 
> where the bug lies to make implementation consistent with documentation (or 
> visa-versa.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6714) Fix Handling of Missing Columns in DRILL-4264

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6714:
---

Assignee: (was: Bohdan Kazydub)

> Fix Handling of Missing Columns in DRILL-4264
> -
>
> Key: DRILL-6714
> URL: https://issues.apache.org/jira/browse/DRILL-6714
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Timothy Farkas
>Priority: Critical
>
> Implement the improvements Salim discussed on this PR 
> https://github.com/apache/drill/pull/1445 to ensure column names are created 
> without backticks. After the improvements are implemented the temporary 
> work-around introduced in DRILL-6706 should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6800) Simplify packaging of Jdbc-all jar

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6800:
---

Assignee: (was: Bohdan Kazydub)

> Simplify packaging of Jdbc-all jar
> --
>
> Key: DRILL-6800
> URL: https://issues.apache.org/jira/browse/DRILL-6800
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Priority: Major
> Fix For: Future, 2.0.0
>
>
> Today Jdbc-all package is created using drill-java-exec as dependency and 
> then excluding unnecessary dependency. Also there is size check for jdbc-all 
> jar to avoid including any unwanted dependency. But configured size has 
> increased over time and doesn't really provide a good mechanism to enforce 
> small footprint of jdbc-all jar. Following are some recommendation to improve 
> it:
>  1) Divide java-exec module into separate client/server and common module
>  2) Have size check for client artifact only.
>  3) Update jdbc-all pom to include newly created client artifact and jdbc 
> driver artifact. 
>  * Have multiple profiles to include and exclude any profile specific 
> dependency. For 
>  example MapR profile will exclude hadoop dependency whereas apache profile 
> will 
>  include it.
>  * We can create 2 artifacts for jdbc-all: one with and other without (for 
> smaller jar size) Hadoop dependencies.
> 4) Update client side protobuf to not have server side definitions like 
> QueryProfile / CoreOperatorType etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6801) Divide java-exec into separate client, server and common module with separate pom.xml files for each module

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6801:
---

Assignee: (was: Bohdan Kazydub)

> Divide java-exec into separate client, server and common module with separate 
> pom.xml files for each module
> ---
>
> Key: DRILL-6801
> URL: https://issues.apache.org/jira/browse/DRILL-6801
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Sorabh Hamirwasia
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-6993) VARBINARY length is ignored on cast

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6993:
---

Assignee: (was: Bohdan Kazydub)

> VARBINARY length is ignored on cast
> ---
>
> Key: DRILL-6993
> URL: https://issues.apache.org/jira/browse/DRILL-6993
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Bohdan Kazydub
>Priority: Major
>
> {{VARBINARY}} precision is not set when casting to {{VARBINARY}} with 
> specified length.
> For example, test case 
> {code}
>   String query = "select cast(r_name as varbinary(31)) as vb from 
> cp.`tpch/region.parquet`;"
>   MaterializedField field = new ColumnBuilder("vb", 
> TypeProtos.MinorType.VARBINARY)
>   .setMode(TypeProtos.DataMode.OPTIONAL)
>   .setWidth(31)
>   .build();
>   BatchSchema expectedSchema = new SchemaBuilder()
>   .add(field)
>   .build();
>   // Validate schema
>   testBuilder()
>   .sqlQuery(q)
>   .schemaBaseLine(expectedSchema)
>   .go();
> {code}
> will fail with
> {code}
> java.lang.Exception: Schema path or type mismatch for column #0:
> Expected schema path: vb
> Actual   schema path: vb
> Expected type: MajorType[minor_type: VARBINARY mode: OPTIONAL precision: 31 
> scale: 0]
> Actual   type: MajorType[minor_type: VARBINARY mode: OPTIONAL]
> {code}
> while for other types, like {{VARCHAR}}, it seems to work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7091:

Fix Version/s: (was: 1.18.0)

> Query with EXISTS and correlated subquery fails with NPE in 
> HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
> --
>
> Key: DRILL-7091
> URL: https://issues.apache.org/jira/browse/DRILL-7091
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Vova Vysotskyi
>Priority: Major
>
> Steps to reproduce:
> 1. Create view:
> {code:sql}
> create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
> {code}
> Run the following query:
> {code:sql}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey)
> {code}
> This query fails with NPE:
> {noformat}
> [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(Pr

[jira] [Assigned] (DRILL-6730) Possible connection leak while accessing S3

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6730:
---

Assignee: (was: Bohdan Kazydub)

> Possible connection leak while accessing S3
> ---
>
> Key: DRILL-6730
> URL: https://issues.apache.org/jira/browse/DRILL-6730
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.14.0
> Environment: h4. Cluster:
> 3 × {{m5.large}} AWS EC2 instances with Ubuntu 16.04.
> Java:
> {code}
> openjdk version "1.8.0_181" 
> OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13) 
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> {code}
> Drill:
> {code}
> #Generated by Git-Commit-Id-Plugin 
> #Tue Jul 31 17:18:08 PDT 2018 
> git.commit.id.abbrev=0508a12 
> git.commit.user.email=bben-...@mapr.com 
> git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n 
> git.commit.id=0508a128853ce796ca7e99e13008e49442f83147 
> git.commit.message.short=[maven-release-plugin] prepare release drill-1.14.0 
> git.commit.user.name=Ben-Zvi 
> git.build.user.name=Ben-Zvi 
> git.commit.id.describe=drill-1.14.0 
> git.build.user.email=bben-...@mapr.com 
> git.branch=0508a128853ce796ca7e99e13008e49442f83147 
> git.commit.time=31.07.2018 @ 16\:50\:38 PDT 
> git.build.time=31.07.2018 @ 17\:18\:08 PDT 
> git.remote.origin.url=https\://github.com/apache/drill.git
> {code}
> h4. Development:
>  Java:
> {code}
> openjdk version "1.8.0_181" 
> OpenJDK Runtime Environment (build 1.8.0_181-b13) 
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode))
> {code}
> Drill:
> {code}
> #Generated by Git-Commit-Id-Plugin 
> #Tue Jul 31 17:18:08 PDT 2018 
> git.commit.id.abbrev=0508a12 
> git.commit.user.email=bben-...@mapr.com 
> git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n 
> git.commit.id=0508a128853ce796ca7e99e13008e49442f83147 
> git.commit.message.short=[maven-release-plugin] prepare release drill-1.14.0 
> git.commit.user.name=Ben-Zvi 
> git.build.user.name=Ben-Zvi 
> git.commit.id.describe=drill-1.14.0 
> git.build.user.email=bben-...@mapr.com 
> git.branch=0508a128853ce796ca7e99e13008e49442f83147 
> git.commit.time=31.07.2018 @ 16\:50\:38 PDT 
> git.build.time=31.07.2018 @ 17\:18\:08 PDT 
> git.remote.origin.url=https\://github.com/apache/drill.git
> {code}
>Reporter: Siarhei Krukau
>Priority: Minor
>
> I have an AWS S3 bucket in {{us-east-1}} with hundreds of thousands of 
> Parquet files named like {{${UUID}.parquet}}. Originally they hold ~50K 
> records and are ~25MiB in size each, but the issue is reproducible even with 
> a single 96KiB file with two records. The records are thin wrappers around 
> OpenRTB's 
> [BidRequest|https://docs.openx.com/Content/demandpartners/openrtb_bidrequest.html].
> I have configured a Drill cluster (no Hadoop) with three {{m5.large}} (CPU × 
> 2, RAM × 8GB) instances in the same region, all defaults. But the issue is 
> reproducible with {{drill-embedded}} as well.
> My S3 data source config (both {{drill-embedded}} and cluster):
> {code:json}
> {
>   "type": "file",
>   "connection": "s3a://bukket/",
>   "config": {
> "fs.s3a.access.key": "XX9XX9XX",
> "fs.s3a.secret.key": "Xx/xxXxxxX9xxxXxxXXxxXxXXx9xxxXXxxx9",
> "fs.s3a.endpoint": "s3.us-east-1.amazonaws.com",
> "fs.s3a.connection.maximum": "100",
> "fs.s3a.connection.timeout": "1"
>   },
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null,
>   "allowAccessOutsideWorkspace": false
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null,
>   "allowAccessOutsideWorkspace": false
> }
>   },
>   "formats": {
> "parquet": {
>   "type": "parquet"
> }
>   },
>   "enabled": true
> }
> {code}
> I omitted non-parquet-related configs for CSV, JSON, and others.
> When enabling debug logs and executing a query like {{SELECT * FROM 
> s3.`slice/9x99x99x-99x9-99x9-x99x-999x999x.parquet` LIMIT 10}} ({{slice}} 
> is a "directory" name inside S3 bucket) first I see the data from Parquet 
> file flowing through the wire, but then Drill tries to get byte ranges with 
> HTTP {{Range}} header:
> {code}
> DEBUG o.a.h.i.conn.DefaultClientConnection - Sending request: GET 
> /slice/9x99x99x-99x9-99x9-x99x-999x999x.parquet HTTP/1.1
> DEBUG org.apache.http.wire - >> "GET 
> /slice/9x99x99x-99x9-99x9-x99x-999x999x.parquet HTTP/1.1[\r][\n]"
> DEBUG org.apache.http.wire - >> "Host: 
> bukket.s3.us-east-1.amazonaws.com[\r][\n]"
> DEBUG org.apache.http.wire - >> "Authorization: AWS 
> XX9XX9XX:xxXxXxXxXx9xxxXXxX9xxXx=[\r][\n]"
> DEBUG org.apache.http.wire - >> "U

[jira] [Assigned] (DRILL-7255) Support nulls for all levels of nesting in complex types

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7255:
---

Assignee: (was: Igor Guzenko)

> Support nulls for all levels of nesting in complex types
> 
>
> Key: DRILL-7255
> URL: https://issues.apache.org/jira/browse/DRILL-7255
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7151) Show only accessible tables when Hive authorization enabled

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7151:
---

Assignee: (was: Igor Guzenko)

> Show only accessible tables when Hive authorization enabled
> ---
>
> Key: DRILL-7151
> URL: https://issues.apache.org/jira/browse/DRILL-7151
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Igor Guzenko
>Priority: Minor
>
> The SHOW TABLES for Hive worked inconsistently for very long time.
> Before changes introduced by DRILL-7115 only accessible tables were shown 
> only when Hive Storage Based Authorization is enabled, but for SQL Standard 
> Based Authorization all tables were shown to user ([related 
> discussion|https://github.com/apache/drill/pull/461#discussion_r58753354]). 
> In scope of DRILL-7115 the only accessible restriction for Storage Based 
> Authorization was weakened in order to improve query performance.
> There is still need to improve security of Hive show tables query and at the 
> same time do not violate performance requirements. 
> For SQL Standard Based Authorization this can be done by asking 
> ```HiveAuthorizationHelper.authorizerV2``` for table's 'SELECT' permission.
> For Storage Based Authorization performance acceptable approach is not known 
> for now, one of ideas is try using appropriate Hive storage based authorizer 
> class for the purpose. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7449) memory leak parse_url function

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7449:
---

Assignee: (was: Igor Guzenko)

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanu

[jira] [Assigned] (DRILL-4092) Support for INTERSECT

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4092:
---

Assignee: (was: Igor Guzenko)

> Support for INTERSECT 
> --
>
> Key: DRILL-4092
> URL: https://issues.apache.org/jira/browse/DRILL-4092
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Victoria Markman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-4092) Support for INTERSECT

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4092:

Fix Version/s: (was: 1.18.0)

> Support for INTERSECT 
> --
>
> Key: DRILL-4092
> URL: https://issues.apache.org/jira/browse/DRILL-4092
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Victoria Markman
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7141) Hash-Join (and Agg) should always spill to disk the least used partition

2020-03-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7141:

Fix Version/s: (was: 1.18.0)

> Hash-Join (and Agg) should always spill to disk the least used partition
> 
>
> Key: DRILL-7141
> URL: https://issues.apache.org/jira/browse/DRILL-7141
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Boaz Ben-Zvi
>Priority: Major
>
> When the probe-side data for a hash join is skewed, it is preferable to have 
> the corresponding partition on the build side to be in memory. 
> Currently, with the spill-to-disk feature, the partition selected for 
> spilling to disk is done at random. This means that a highly skewed 
> probe-side data would also spill for lack of a corresponding hash table 
> partition in memory. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >