[jira] [Created] (DRILL-5924) native-client: Support user-specified CXX_FLAGS

2017-11-02 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-5924:
--

 Summary: native-client: Support user-specified CXX_FLAGS
 Key: DRILL-5924
 URL: https://issues.apache.org/jira/browse/DRILL-5924
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - C++
Reporter: Uwe L. Korn


Currently the build process for the native client overrides the CXX_FLAGS 
supplied by the user. In some cases we need to pass additional flags, e.g. 
{{-fpermissive}}, to the build to have it succeed. Thus instead of overriding 
these flags, they should only be expanded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5800) Explicitly set locale to en_US on locale-dependent tests

2017-09-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved DRILL-5800.

Resolution: Duplicate

> Explicitly set locale to en_US on locale-dependent tests
> 
>
> Key: DRILL-5800
> URL: https://issues.apache.org/jira/browse/DRILL-5800
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Uwe L. Korn
>
> Some tests depend on the locale, i.e. they run with {{en_US}} successfully 
> but fail with {{de_DE}} due to a different decimal separator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5800) Explicitly set locale to en_US on locale-dependent tests

2017-09-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-5800:
--

 Summary: Explicitly set locale to en_US on locale-dependent tests
 Key: DRILL-5800
 URL: https://issues.apache.org/jira/browse/DRILL-5800
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Uwe L. Korn


Some tests depend on the locale, i.e. they run with {{en_US}} successfully but 
fail with {{de_DE}} due to a different decimal separator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5799) native-client: Support alternative build directories

2017-09-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-5799:
--

 Summary: native-client: Support alternative build directories
 Key: DRILL-5799
 URL: https://issues.apache.org/jira/browse/DRILL-5799
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - C++
Reporter: Uwe L. Korn


At the moment the native client only supports {{build}} as its build directory. 
This should be freely choosable by the user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5261) Expose REST endpoint in zookeeper

2017-02-13 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-5261:
--

 Summary: Expose REST endpoint in zookeeper
 Key: DRILL-5261
 URL: https://issues.apache.org/jira/browse/DRILL-5261
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Uwe L. Korn


It would be nice to also publish the REST API endpoint of each Drillbit in the 
Zookeeper. This would mean that we need an additional entry in 
{{DrillbitEndpoint}}. While I would know how to add the attribute to the 
ProtoBuf definition and filling the attribute with the correct information, I'm 
unsure if there is the need for some migration code to support older 
{{DrillbitEndpoint}} implementations that don't have this attribute.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2016-12-08 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731567#comment-15731567
 ] 

Uwe L. Korn commented on DRILL-4935:


To add another use case where this is very helpful:

If you run Drill on nodes with multiple hostnames, you can use this to tell 
them the correct hostname to use. For example host often have a management and 
a service hostname. Depending on your setup {{/etc/hostname}} may be the 
management hostname. In this case the drillbits would by default try to connect 
to eachother using their management interfaces (which will be firewalled). With 
the new configuration option, you can force them to advertise their hostnames 
and connect using the correct hostnames/interfaces.

> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2016-11-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641593#comment-15641593
 ] 

Uwe L. Korn commented on DRILL-4935:


[~harrisonmebane] The earlier you start a PR, the earlier we can discuss on the 
code ;)

> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Priority: Minor
> Fix For: Future
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4979) Make dataport configurable

2016-10-28 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614793#comment-15614793
 ] 

Uwe L. Korn commented on DRILL-4979:


Implementation-wise it seems like the only point where the {{+1}} is actually 
hardcoded is in 
{{exec/java-exec/src/main/java/org/apache/drill/exec/rpc/data/DataConnectionCreator.java:62}},
 i.e. {{int port = server.bind(partialEndpoint.getControlPort() + 1, 
allowPortHunting)}}. Therefore a fix for this could be to introduce a new 
configuration option {{drill.exec.rpc.bit.server.dataport}} and use this if 
set, otherwise default back to {{controlport + 1}}.

> Make dataport configurable
> --
>
> Key: DRILL-4979
> URL: https://issues.apache.org/jira/browse/DRILL-4979
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server
>Affects Versions: 1.8.0
> Environment: Scheduling drillbits with Apache Mesos+Aurora
>Reporter: Uwe L. Korn
>
> Currently the dataport of a Drillbit is fixed to +1 on the control port. In a 
> dynamic execution environment like Apache Mesos+Aurora, each port is 
> allocated by the scheduler and then passed on to the application process. 
> There is no possibility or guarantee to allocate two consecutive ports. 
> Therefore, to run Drill in this environment, the dataport of the drillbit 
> also needs to configurable by the scheduler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4979) Make dataport configurable

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4979:
--

 Summary: Make dataport configurable
 Key: DRILL-4979
 URL: https://issues.apache.org/jira/browse/DRILL-4979
 Project: Apache Drill
  Issue Type: New Feature
  Components:  Server
Affects Versions: 1.8.0
 Environment: Scheduling drillbits with Apache Mesos+Aurora
Reporter: Uwe L. Korn


Currently the dataport of a Drillbit is fixed to +1 on the control port. In a 
dynamic execution environment like Apache Mesos+Aurora, each port is allocated 
by the scheduler and then passed on to the application process. There is no 
possibility or guarantee to allocate two consecutive ports. Therefore, to run 
Drill in this environment, the dataport of the drillbit also needs to 
configurable by the scheduler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4978) Parquet metadata cache on S3 is always renewed

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4978:
--

 Summary: Parquet metadata cache on S3 is always renewed
 Key: DRILL-4978
 URL: https://issues.apache.org/jira/browse/DRILL-4978
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.8.0
 Environment: Hadoop s3a storage
Reporter: Uwe L. Korn


As dictionary modification times are not tracked by S3 (see 
https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories
 ) the Parquet metadata is always renewed on query planning.

This could either be tuned by:
 * for the case of s3a, check the modification times of all Parquet files in 
this directory
 * deactivate the metadata cache for s3a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4977) Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4977:
--

 Summary: Reading parquet metadata cache from S3 with 
fadvise=random and Hadoop 3 generates a large number of requests
 Key: DRILL-4977
 URL: https://issues.apache.org/jira/browse/DRILL-4977
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Parquet
Affects Versions: 1.8.0
 Environment: Hadoop 3.0
Reporter: Uwe L. Korn


When using the new {{fs.s3a.experimental.input.fadvise=random}} mode for 
accessing Parquet files stored in S3, we see a significant improvement for the 
query performance but a slowdown on query planning. This is due to the way the 
metadata file is read (each chunk of 8000 bytes generates a new GET request to 
S3). Indicating with {{FSDataInputStream.setReadahead(metadata-filesize)}} that 
we will read the whole file, this behaviour is circumvented. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4976) Querying Parquet files on S3 pulls too much data

2016-10-28 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated DRILL-4976:
---
Summary: Querying Parquet files on S3 pulls too much data   (was: Querying 
Parquet files on S3 pulls )

> Querying Parquet files on S3 pulls too much data 
> -
>
> Key: DRILL-4976
> URL: https://issues.apache.org/jira/browse/DRILL-4976
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Uwe L. Korn
>
> Currently (Drill 1.8, Hadoop 2.7.2) when queries are executed on files stored 
> in S3, the underlying implementation of s3a requests magnitudes too much 
> data. Given sufficient seek sizes, the following HTTP pattern is observed:
> * GET bytes=8k-100M
> * GET bytes=2M-100M
> * GET bytes=4M-100M
> Although the HTTP request were normally aborted before all the data was
> send by the server, it was still about 10-15x the size of the input files
> that went over the network, i.e. for a file of the size of 100M, sometimes 1G 
> of data is transferred over the network.
> A fix for this is the newly introduced 
> {{fs.s3a.experimental.input.fadvise=random}} mode which will be introduced 
> with Hadoop 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4959) Drill 8.1 not able to connect to S3

2016-10-24 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602668#comment-15602668
 ] 

Uwe L. Korn commented on DRILL-4959:


Note that the linked blog article is outdated. For connecting to S3, there 
should be no requirement anymore for jets3t if you use the s3a backend.

Ensure that you have used the configuration as described on 
https://drill.apache.org/docs/s3-storage-plugin/ and also used {{s3a://}} in 
your URL. {{s3://}} and {{s3n://}} are still available but outdated. 

> Drill 8.1 not able to connect to S3
> ---
>
> Key: DRILL-4959
> URL: https://issues.apache.org/jira/browse/DRILL-4959
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Gopal Nagar
>
> Hi Team,
> I have followed below details to integrate Drill with AWS S3. Query keep 
> running for hours and doesn't display any output (I am querying only 2 row 
> file from S3).
> Reference link
> ---
> https://abhishek-tiwari.com/post/reflections-on-apache-drill
> https://drill.apache.org/docs/s3-storage-plugin/ 
> Query Format (Tried from UI & CLI)
> 
> select * from `s3`.`hive.csv` LIMIT 10;
> select * from `s3`.`bucket_name/hive.csv` LIMIT 10;
> After seeing below log, I tried including jets3t-0.9.3.jar in jars directory 
> but it doesn't fix my problem.
> Log Details
> --
> 2016-10-24 17:00:02,461 [27f1c1ec-d82e-ba2a-2840-e7104320418f:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 27f1c1ec-d82e-ba2a-2840-e7104320418f: select * from `s3`.`hive.csv` LIMIT 10
> 2016-10-24 17:00:02,479 [drill-executor-39] ERROR 
> o.a.d.exec.server.BootStrapContext - 
> org.apache.drill.exec.work.foreman.Foreman.run() leaked an exception.
> java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)