Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-23 Thread Khurram Faraaz
Congratulations Charles, and thank you Arina.

Regards,
Khurram

On Fri, Aug 23, 2019 at 2:54 PM Niels Basjes  wrote:

> Congratulations Charles.
>
> Niels Basjes
>
> On Thu, Aug 22, 2019, 09:28 Arina Ielchiieva  wrote:
>
> > Hi all,
> >
> > It has been a honor to serve as Drill Chair during the past year but it's
> > high time for the new one...
> >
> > I am very pleased to announce that the Drill PMC has voted to elect
> Charles
> > Givre as the new PMC chair of Apache Drill. He has also been approved
> > unanimously by the Apache Board in last board meeting.
> >
> > Congratulations, Charles!
> >
> > Kind regards,
> > Arina
> >
>


Re: Heap memory - leak

2019-06-12 Thread Khurram Faraaz
Hi Dweep,

I see that you are using the REST API to execute the queries. The
recommended way for complex queries that are resource intensive, is to
always use either the JDBC/ODBC driver to submit queries for execution.
Here is a difference for your reference. Please use JDBC/ODBC to execute
your queries and let us know if that resolves the issue for you.

*REST API* is a stateless protocol so when a query request is submitted by
REST client then the request thread on server side will block until the
query is fully executed. When all the results are available on server side,
it is converted into strings to be sent back to the REST client. So if we
have query results very big that will end up consuming lots of memory on
the server side and at the same it will be difficult for certain REST client
(like browsers) to render all those requests at once. Due to this we see
unresponsive browser and high heap usage in Drillbit as well. Basically
there is lack of pagination kind of protocol because of statelessness.
Because of this limitation we always suggest to use RESTapi for smaller
footprint query or running limit query which can result in smaller number
of memory footprint.''

*JDBC/ODBC API* on the other hand is stateful protocol which means that
client side has some context when a query is actually completed and can get
results in incremental order. So using JDBC api Drill client doesn't have
to wait for receiving all the results row from server side before sending
to application. It can start sending results whenever the first batch is
available and JDBC provides api to receive incremental results (next call).
So the problems related to higher memory footprint to get all the results
row at once is not there. Now if JDBC application is pulling data slowly
then there are feedback mechanism built in which will affect the execution
of query too. It can tell server side to stop the query pipeline from
sending more results since application is not pulling it fast enough. Hence
there is feedback mechanism which comes into play and all these are because
of statefulness of the JDBC/ODBC side of protocol.

Thanks,
Khurram

On Fri, Jun 7, 2019 at 4:50 PM Paul Rogers 
wrote:

> Hi Dweep,
>
> This mailing list does not support attachments. Consider filing a JIRA
> ticket and attaching your images there: [1]
>
> You mention you've assigned Drill 14 GB of heap. You also mention that
> your task ran out of heap. As it turns out, Drill also uses direct memory
> to store intermediate data. I wonder if the error condition is actually
> about direct memory. How much direct memory have you given to Drill?
>
> 14 GB of heap and the default direct memory (8GB) should be plenty for a
> query that produces a 48 MB Parquet file: assuming that the input size is
> similar: ~200 MB (uncompressed JSON).
>
>
> You mention that you run the JSON-to-Parquet conversion once per hour. Do
> you use this Drill instance for any other tasks? Are there other tasks
> running at the same time? How many nodes of Drill are in use?
>
>
> Finally, you mention you use the REST API. Perhaps something odd is
> happening there. A stack trace of your error would help. The stack trace
> may be in the error message, or in the Drill log file.
>
>
> Thanks,
> - Paul
>
> [1] https://issues.apache.org/jira
>
>
> On Friday, June 7, 2019, 2:36:36 AM PDT, Dweep Sharma <
> dweep.sha...@redbus.com> wrote:
>
>  Hi Divya,
>
> The size is 48 MB (after converting to Parquet)
>
>
>
>
>
> On Fri, Jun 7, 2019 at 1:45 PM Divya Gehlot 
> wrote:
>
> Can you share the more details .
> Query profile and other aspects like data size and all to have better view
> what’s happening
>
>
> Thanks ,
> Divya
>
> On Fri, 7 Jun 2019 at 4:13 PM, Dweep Sharma 
> wrote:
>
> > Data is in JSON format.
> >
> > On Fri, Jun 7, 2019 at 1:39 PM Dweep Sharma 
> > wrote:
> >
> > > Hi,
> > >
> > > I have a memory leak issue. 14GB memory is assigned to heap but it gets
> > > full within a day with just one cron running.
> > >
> > > Task is a CTAS query from Kafka to S3 once every hour. CTAS is issued
> via
> > > the Drill Rest API.
> > >
> > > Please assist on a resolution.
> > >
> > > -Dweep
> > >
> >
> >
>
>


Re: “Failed to construct kafka consumer” using Apache Drill

2019-06-12 Thread Khurram Faraaz
To increase your open file limits on your system, please refer to the
content on below link. Let us know if that resolved it for you.
https://easyengine.io/tutorials/linux/increase-open-files-limit/

Thanks,
Khurram

On Tue, Jun 11, 2019 at 10:45 PM Khurram Faraaz  wrote:

> Hi,
>
> By looking at the stack trace, the issue seems to not originate from
> Drill, instead it is a Kafka issue, did you try to increase the limit on
> the OS to avoid the Too many files open error ?
> What version of Apache Drill, Kafka and OS are you trying this on ?
>
> Caused by: org.apache.kafka.common.KafkaException:
> java.io.IOException: Too many open files
> at org.apache.kafka.common.network.Selector.(Selector.java:129)
> ~[kafka-clients-0.11.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.(Selector.java:156)
> ~[kafka-clients-0.11.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.(Selector.java:160)
> ~[kafka-clients-0.11.0.1.jar:na]
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:701)
> ~[kafka-clients-0.11.0.1.jar:na]
> ... 25 common frames omitted
> Caused by: java.io.IOException: Too many open files
> at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
> ~[na:1.8.0_181]
> at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:130)
> ~[na:1.8.0_181]
> at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:69)
> ~[na:1.8.0_181]
> at sun.nio.ch
> .EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
> ~[na:1.8.0_181]
> at java.nio.channels.Selector.open(Selector.java:227) ~[na:1.8.0_181]
> at org.apache.kafka.common.network.Selector.(Selector.java:127)
> ~[kafka-clients-0.11.0.1.jar:na]
>
> Thanks,
> Khurram
>
> On Tue, Jun 11, 2019 at 2:38 PM Aravind Voruganti <
> aravind.vorugant...@gmail.com> wrote:
>
>> Hello Apache Drill support,
>>
>> I am Aravind Voruganti a Java developer and a big fan of Apache Drill.
>> Below is the issue I was facing with my new application when using Drill
>> over Kafka for querying. I have also detailed the issue in Stackoverflow
>> which can found at:
>>
>>
>> https://stackoverflow.com/questions/56549783/failed-to-construct-kafka-consumer-using-apache-drill
>>
>> I am using the Apache Drill (1.14) JDBC driver in my application which
>> consumes the data from the Kafka. The application works just fine for some
>> time and after few iterations it fails to execute due to the following Too
>> many files open issue. I made sure there are no file handle leaks in my
>> code but still nor sure why this issue is happening?
>>
>> It looks like the issue is happening from with-in the Apache drill
>> libraries when constructing the Kafka consumer. Can any one please guide
>> me
>> help this problem fixed?
>>
>> The problem perishes when I restart my Apache drillbit but very soon it
>> happens again. I did check the file descriptor count on my unix machine
>> using ulimit -a | wc -l & lsof -a -p  | wc -l before and after the
>> drill process restart and it seems the drill process is considerably
>> taking
>> a lot of file descriptors. I tried increasing the file descriptor count on
>> the system but still no luck.
>>
>> I have followed the Apache Drill storage plugin documentation in
>> configuring the Kafka plugin into Apache Drill at
>> https://drill.apache.org/docs/kafka-storage-plugin/
>>
>> Any help on this issue is highly appreciated. Thanks.
>>
>> JDBC URL: jdbc:drill:drillbit=localhost:31010;schema=kafka
>>
>> NOTE: I am pushing down the filters in my query SELECT * FROM myKafkaTopic
>> WHERE kafkaMsgTimestamp > 1560210931626
>>
>> org.apache.drill.common.exceptions.UserException: DATA_READ ERROR:
>> Failed to fetch start/end offsets of the topic  myKafkaTopic
>>
>> Failed to construct kafka consumer
>>
>> [Error Id: 73f896a7-09d4-425b-8cd5-f269c3a6e69a ]
>> at
>> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>> ~[drill-common-1.14.0.jar:1.14.0]
>> at
>> org.apache.drill.exec.store.kafka.KafkaGroupScan.init(KafkaGroupScan.java:198)
>> [drill-storage-kafka-1.14.0.jar:1.14.0]
>> at
>> org.apache.drill.exec.store.kafka.KafkaGroupScan.(KafkaGroupScan.java:98)
>> [drill-storage-kafka-1.14.0.jar:1.14.0]
>> at
>> org.apache.drill.exec.store.kafka.KafkaStoragePlugin.getPhysicalScan(KafkaStoragePlugin.java:83)
>> [drill-storage-kafka-1.14.0.jar:1.14.0]
>> at
>> org.apache.drill.exec.store.Abs

Re: “Failed to construct kafka consumer” using Apache Drill

2019-06-11 Thread Khurram Faraaz
Hi,

By looking at the stack trace, the issue seems to not originate from Drill,
instead it is a Kafka issue, did you try to increase the limit on the OS to
avoid the Too many files open error ?
What version of Apache Drill, Kafka and OS are you trying this on ?

Caused by: org.apache.kafka.common.KafkaException:
java.io.IOException: Too many open files
at org.apache.kafka.common.network.Selector.(Selector.java:129)
~[kafka-clients-0.11.0.1.jar:na]
at org.apache.kafka.common.network.Selector.(Selector.java:156)
~[kafka-clients-0.11.0.1.jar:na]
at org.apache.kafka.common.network.Selector.(Selector.java:160)
~[kafka-clients-0.11.0.1.jar:na]
at
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:701)
~[kafka-clients-0.11.0.1.jar:na]
... 25 common frames omitted
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
~[na:1.8.0_181]
at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:130)
~[na:1.8.0_181]
at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:69)
~[na:1.8.0_181]
at sun.nio.ch
.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
~[na:1.8.0_181]
at java.nio.channels.Selector.open(Selector.java:227) ~[na:1.8.0_181]
at org.apache.kafka.common.network.Selector.(Selector.java:127)
~[kafka-clients-0.11.0.1.jar:na]

Thanks,
Khurram

On Tue, Jun 11, 2019 at 2:38 PM Aravind Voruganti <
aravind.vorugant...@gmail.com> wrote:

> Hello Apache Drill support,
>
> I am Aravind Voruganti a Java developer and a big fan of Apache Drill.
> Below is the issue I was facing with my new application when using Drill
> over Kafka for querying. I have also detailed the issue in Stackoverflow
> which can found at:
>
>
> https://stackoverflow.com/questions/56549783/failed-to-construct-kafka-consumer-using-apache-drill
>
> I am using the Apache Drill (1.14) JDBC driver in my application which
> consumes the data from the Kafka. The application works just fine for some
> time and after few iterations it fails to execute due to the following Too
> many files open issue. I made sure there are no file handle leaks in my
> code but still nor sure why this issue is happening?
>
> It looks like the issue is happening from with-in the Apache drill
> libraries when constructing the Kafka consumer. Can any one please guide me
> help this problem fixed?
>
> The problem perishes when I restart my Apache drillbit but very soon it
> happens again. I did check the file descriptor count on my unix machine
> using ulimit -a | wc -l & lsof -a -p  | wc -l before and after the
> drill process restart and it seems the drill process is considerably taking
> a lot of file descriptors. I tried increasing the file descriptor count on
> the system but still no luck.
>
> I have followed the Apache Drill storage plugin documentation in
> configuring the Kafka plugin into Apache Drill at
> https://drill.apache.org/docs/kafka-storage-plugin/
>
> Any help on this issue is highly appreciated. Thanks.
>
> JDBC URL: jdbc:drill:drillbit=localhost:31010;schema=kafka
>
> NOTE: I am pushing down the filters in my query SELECT * FROM myKafkaTopic
> WHERE kafkaMsgTimestamp > 1560210931626
>
> org.apache.drill.common.exceptions.UserException: DATA_READ ERROR:
> Failed to fetch start/end offsets of the topic  myKafkaTopic
>
> Failed to construct kafka consumer
>
> [Error Id: 73f896a7-09d4-425b-8cd5-f269c3a6e69a ]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
> ~[drill-common-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.store.kafka.KafkaGroupScan.init(KafkaGroupScan.java:198)
> [drill-storage-kafka-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.store.kafka.KafkaGroupScan.(KafkaGroupScan.java:98)
> [drill-storage-kafka-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.store.kafka.KafkaStoragePlugin.getPhysicalScan(KafkaStoragePlugin.java:83)
> [drill-storage-kafka-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:111)
> [drill-java-exec-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:99)
> [drill-java-exec-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.(DrillScanRel.java:89)
> [drill-java-exec-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.(DrillScanRel.java:69)
> [drill-java-exec-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.(DrillScanRel.java:62)
> [drill-java-exec-1.14.0.jar:1.14.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRule.onMatch(DrillScanRule.java:38)
> [drill-java-exec-1.14.0.jar:1.14.0]
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212)
> [calcite-core-1.16.0-drill-r6.jar:1.16.0-drill-r6]
> at
> 

Re: DFS No extension files

2019-06-11 Thread Khurram Faraaz
Hi,

Can you please share the exact error message that you see ?
And the details from drillbit.log for that error too.

What is the type of file that you  are interested/trying to query from
Drill ?
Is it parquet, JSON, CSV or a directory ?

If you can please answer those questions we can better assist you.

Thanks,
Khurram

On Tue, Jun 11, 2019 at 10:43 AM kokona  wrote:

> Hi,
> I am sending you an email because there is a problem using an Apache Drill.
>
> - How can I find files that do not have an extension?
>
> Thank you.
>


Re: Drill: Cannot decode handshake

2019-05-23 Thread Khurram Faraaz
Hi,

Can you please confirm that you have followed the steps that are documented
here,

Installing the Driver on Mac OS X
https://drill.apache.org/docs/installing-the-driver-on-mac-os-x/

Configuring ODBC on Mac OS X
https://drill.apache.org/docs/configuring-odbc-on-mac-os-x/

Thanks,
Khurram

On Thu, May 23, 2019 at 10:46 AM Nicolas Seyvet 
wrote:

> Hi,
>
> I am new to drill and running into a problem with a basic setup.
>
> From https://github.com/HariSekhon/Dockerfiles/tree/master/apache-drill,
> I run a docker-compose ($ docker-compose up) of drill, and open up port
> 8047 for incoming drill requests.  The UI at http://0.0.0.0:8047 works.
>
> On OSX, I have installed the ODBC driver
> http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.3.22.1055/
>
> Then updated my odbc.ini file under ~/Library/ODBC with my port and host
> such as:
> 
> [ODBC]
> Trace=yes
>
> [ODBC Data Sources]
> MapR Drill=MapR Drill ODBC Driver
>
> [MapR Drill]
>
> # Description: DSN Description.
> # This key is not necessary and is only to give a description of the data
> source.
> Description=MapR Drill ODBC Driver DSN
>
> Driver=/Library/mapr/drill/lib/libdrillodbc_sbu.dylib
> ConnectionType=Direct
> HOST=0.0.0.0
> PORT=18047
> AuthenticationType=No Authentication
> DelegationUID=
> #KrbServiceHost=
> #KrbServiceName=
>
> AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=10;QueryTimeout=180;TimestampTZDisplayTimezone=utc;
> ExcludedSchemas=sys,INFORMATION_SCHEMA;NumberOfPrefetchBuffers=5;
> Catalog=DRILL
> Schema=
> -
>
> Then using the DrillExplorer when I try to connect to my DSN, I keep
> getting the following error:
>
> Connection failed.
> [MapR][Drill] (20) Failure occurred while trying to connect to local=
> 0.0.0.0:8047. (Details: [30004]Connection failed with error: Cannot
> decode handshake.)
>
> Any ideas?
>
> Thanks
> --
> Regards,
> Nicolas Seyvet
>


Re: WELCOME to user@drill.apache.org

2019-05-23 Thread Khurram Faraaz
The suggestion was to upgrade to Hive 2.x  ( preferable Hive 2.3 ) and not
Hive 3.0
You should try with drill 1.16.0 and Hive 2.3

Thanks,
Khurram

On Thu, May 23, 2019 at 2:26 PM Ahmad, Sami 
wrote:

> I upgraded my hive to hive3.0 and using apache drill 1.16 but now hive
> storage plugin is not liking the config file
>
>
>
> It says “please rety: Error while creating/update storage: null
>
>
>
> Please help
>
>
>
> *From:* Khurram Faraaz [mailto:kfar...@mapr.com]
> *Sent:* Tuesday, May 14, 2019 6:35 PM
> *To:* user@drill.apache.org; Ahmad, Sami 
> *Subject:* Re: WELCOME to user@drill.apache.org
>
>
>
> If you are using mySQL for hive metastore, your entries in Drill's hive
> storage plugin should like below
>
>
>
> {
>
>   "type": "hive",
>
>   "configProps": {
>
> "hive.metastore.uris": "thrift://1.2.10.10:9083
> <https://gcc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F1.2.10.10%3A9083=02%7C01%7CSami.Ahmad%40dot.state.fl.us%7C0ce58726dc0d4eaca16308d6d8bc7cd2%7Cdb21de5dbc9c420c8f3f8f08f85b5ada%7C0%7C0%7C636934701424731822=I46ULqkp5zD12nOKgkzKAVxc%2FlLYt4u46faSLO7lZbg%3D=0>
> ",
>
> "javax.jdo.option.ConnectionURL": "jdbc:mysql://
> 1.2.10.10:3306/hive?createDatabaseIfNotExist=true
> <https://gcc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F1.2.10.10%3A3306%2Fhive%3FcreateDatabaseIfNotExist%3Dtrue=02%7C01%7CSami.Ahmad%40dot.state.fl.us%7C0ce58726dc0d4eaca16308d6d8bc7cd2%7Cdb21de5dbc9c420c8f3f8f08f85b5ada%7C0%7C0%7C636934701424741817=amlo2YnFi9gW4elCLwblGP4ePemYmusxjX1PZKeL7Do%3D=0>
> ",
>
> "javax.jdo.option.ConnectionUserName": "root",
>
> "javax.jdo.option.ConnectionPassword": "test",
>
> "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
>
> "fs.default.name
> <https://gcc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.default.name=02%7C01%7CSami.Ahmad%40dot.state.fl.us%7C0ce58726dc0d4eaca16308d6d8bc7cd2%7Cdb21de5dbc9c420c8f3f8f08f85b5ada%7C0%7C0%7C636934701424741817=FoLsiXTaLRhfm6pi7ZNmRJkm6LDziPRjMiFnkkw9hLg%3D=0>":
> "file:///",
>
> "hive.metastore.sasl.enabled": "true",
>
> "datanucleus.schema.autoCreateAll": "true"
>
>   },
>
>   "enabled": true
>
> }
>
>
>
> On Tue, May 14, 2019 at 1:53 PM Ahmad, Sami 
> wrote:
>
> I see bunch of hive jar files in the jars\3rdparty directory but they are
> different version than my hive jars .
> I tried copying the hive jars from my hive environment to  this 3rdparty
> folder but it didn’t help also
>
> [root@hadoop1 3rdparty]# pwd
> /root/apache-drill-1.16.0/jars/3rdparty
>
> [root@hadoop1 3rdparty]# ls -ltr hive*
> -rw-r--r-- 1 root root  112869 Apr 24 15:05 hive-hbase-handler-2.3.2.jar
> -rw-r--r-- 1 root root  125727 Apr 24 15:05 hive-contrib-2.3.2.jar
> -rw-r--r-- 1 root root 8119261 Apr 24 15:05 hive-metastore-2.3.2.jar
>
>
> I copied the files below to the jars\3rdparty folder .
>
> -rw-r--r-- 1 root root 6345039 May 14 15:51
> hive-metastore-1.2.1000.2.6.5.0-292.jar
> -rw-r--r-- 1 root root  122581 May 14 15:52
> hive-contrib-1.2.1000.2.6.5.0-292.jar
> -rw-r--r-- 1 root root  120473 May 14 15:52
> hive-hbase-handler-1.2.1000.2.6.5.0-292.jar
> [root@hadoop1 3rdparty]#
> [root@hadoop1 3rdparty]# pwd
> /root/apache-drill-1.16.0/jars/3rdparty
> You have mail in /var/spool/mail/root
> [root@hadoop1 3rdparty]#
>
> -Original Message-
> From: Khurram Faraaz [mailto:kfar...@mapr.com]
> Sent: Tuesday, May 14, 2019 3:10 PM
> To: user@drill.apache.org
> Cc: user-h...@drill.apache.org
> Subject: Re: WELCOME to user@drill.apache.org
>
> Looking at you Hive storage plugin, there are two entries (see below) that
> are missing, can you please add them with appropriate values to Hive
> storage plugin and give it a try ?
>
> "javax.jdo.option.ConnectionURL":
> "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
> "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
>
> Thanks,
> Khurram
>
> On Tue, May 14, 2019 at 11:31 AM Ahmad, Sami 
> wrote:
>
> > Same problem I have with hbase ,  it wont show the tables.
> > Hbase version is 1.1.2
> >
> > Here is the storage plugin for hbase I am using
> >
> > {
> >   "type": "hbase",
> >   "config": {
> > "hbase.zookeeper.quorum": "hadoop1",
> > "hbase.zookeeper.property.clientPort": "2181"
> >   },

Re: WELCOME to user@drill.apache.org

2019-05-14 Thread Khurram Faraaz
If you are using mySQL for hive metastore, your entries in Drill's hive
storage plugin should like below

{
  "type": "hive",
  "configProps": {
"hive.metastore.uris": "thrift://1.2.10.10:9083",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://
1.2.10.10:3306/hive?createDatabaseIfNotExist=true",
"javax.jdo.option.ConnectionUserName": "root",
"javax.jdo.option.ConnectionPassword": "test",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "file:///",
"hive.metastore.sasl.enabled": "true",
"datanucleus.schema.autoCreateAll": "true"
  },
  "enabled": true
}

On Tue, May 14, 2019 at 1:53 PM Ahmad, Sami 
wrote:

> I see bunch of hive jar files in the jars\3rdparty directory but they are
> different version than my hive jars .
> I tried copying the hive jars from my hive environment to  this 3rdparty
> folder but it didn’t help also
>
> [root@hadoop1 3rdparty]# pwd
> /root/apache-drill-1.16.0/jars/3rdparty
>
> [root@hadoop1 3rdparty]# ls -ltr hive*
> -rw-r--r-- 1 root root  112869 Apr 24 15:05 hive-hbase-handler-2.3.2.jar
> -rw-r--r-- 1 root root  125727 Apr 24 15:05 hive-contrib-2.3.2.jar
> -rw-r--r-- 1 root root 8119261 Apr 24 15:05 hive-metastore-2.3.2.jar
>
>
> I copied the files below to the jars\3rdparty folder .
>
> -rw-r--r-- 1 root root 6345039 May 14 15:51
> hive-metastore-1.2.1000.2.6.5.0-292.jar
> -rw-r--r-- 1 root root  122581 May 14 15:52
> hive-contrib-1.2.1000.2.6.5.0-292.jar
> -rw-r--r-- 1 root root  120473 May 14 15:52
> hive-hbase-handler-1.2.1000.2.6.5.0-292.jar
> [root@hadoop1 3rdparty]#
> [root@hadoop1 3rdparty]# pwd
> /root/apache-drill-1.16.0/jars/3rdparty
> You have mail in /var/spool/mail/root
> [root@hadoop1 3rdparty]#
>
> -Original Message-
> From: Khurram Faraaz [mailto:kfar...@mapr.com]
> Sent: Tuesday, May 14, 2019 3:10 PM
> To: user@drill.apache.org
> Cc: user-h...@drill.apache.org
> Subject: Re: WELCOME to user@drill.apache.org
>
> Looking at you Hive storage plugin, there are two entries (see below) that
> are missing, can you please add them with appropriate values to Hive
> storage plugin and give it a try ?
>
> "javax.jdo.option.ConnectionURL":
> "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
> "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
>
> Thanks,
> Khurram
>
> On Tue, May 14, 2019 at 11:31 AM Ahmad, Sami 
> wrote:
>
> > Same problem I have with hbase ,  it wont show the tables.
> > Hbase version is 1.1.2
> >
> > Here is the storage plugin for hbase I am using
> >
> > {
> >   "type": "hbase",
> >   "config": {
> > "hbase.zookeeper.quorum": "hadoop1",
> > "hbase.zookeeper.property.clientPort": "2181"
> >   },
> >   "size.calculator.enabled": false,
> >   "enabled": true
> > }
> >
> > -Original Message-
> > From: Ahmad, Sami [mailto:sami.ah...@dot.state.fl.us]
> > Sent: Tuesday, May 14, 2019 2:25 PM
> > To: user@drill.apache.org
> > Cc: user-h...@drill.apache.org
> > Subject: RE: WELCOME to user@drill.apache.org
> >
> > Hi Khurram
> >
> > 1-  yes the show tables command work from hive shell
> >
> > hive> show tables
> > > ;
> > OK
> > test
> > test_orc_richa1116
> > test_orc_txn
> > tweets
> > users2
> > xyz
> > Time taken: 1.385 s
> >
> >
> >
> > 2- hive version is  1.2.1
> >
> > 3- storage plugin in shown below
> >
> > {
> >   "type": "hive",
> >   "configProps": {
> > "hive.metastore.uris": "thrift://hadoop2:9083",
> > "hive.metastore.sasl.enabled": "false",
> > "fs.default.name": "hdfs:///"
> >   },
> >   "enabled": true
> > }
> >
> >
> > Regards
> > Sami
> >
> > -Original Message-
> > From: Khurram Faraaz [mailto:kfar...@mapr.com]
> > Sent: Tuesday, May 14, 2019 2:18 PM
> > To: user@drill.apache.org
> > Cc: user-h...@drill.apache.org
> > Subject: Re: WELCOME to user@drill.apache.org
> >
> > Hi Ahmad,
> >
> > Does the "show tables" command work for you from Hive shell ?
> > Can you please share you Hive version that is being used ?

Re: WELCOME to user@drill.apache.org

2019-05-14 Thread Khurram Faraaz
Looking at you Hive storage plugin, there are two entries (see below) that
are missing, can you please add them with appropriate values to Hive
storage plugin and give it a try ?

"javax.jdo.option.ConnectionURL":
"jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",

Thanks,
Khurram

On Tue, May 14, 2019 at 11:31 AM Ahmad, Sami 
wrote:

> Same problem I have with hbase ,  it wont show the tables.
> Hbase version is 1.1.2
>
> Here is the storage plugin for hbase I am using
>
> {
>   "type": "hbase",
>   "config": {
> "hbase.zookeeper.quorum": "hadoop1",
> "hbase.zookeeper.property.clientPort": "2181"
>   },
>   "size.calculator.enabled": false,
>   "enabled": true
> }
>
> -Original Message-
> From: Ahmad, Sami [mailto:sami.ah...@dot.state.fl.us]
> Sent: Tuesday, May 14, 2019 2:25 PM
> To: user@drill.apache.org
> Cc: user-h...@drill.apache.org
> Subject: RE: WELCOME to user@drill.apache.org
>
> Hi Khurram
>
> 1-  yes the show tables command work from hive shell
>
> hive> show tables
> > ;
> OK
> test
> test_orc_richa1116
> test_orc_txn
> tweets
> users2
> xyz
> Time taken: 1.385 s
>
>
>
> 2- hive version is  1.2.1
>
> 3- storage plugin in shown below
>
> {
>   "type": "hive",
>   "configProps": {
> "hive.metastore.uris": "thrift://hadoop2:9083",
> "hive.metastore.sasl.enabled": "false",
> "fs.default.name": "hdfs:///"
>   },
>   "enabled": true
> }
>
>
> Regards
> Sami
>
> -Original Message-
> From: Khurram Faraaz [mailto:kfar...@mapr.com]
> Sent: Tuesday, May 14, 2019 2:18 PM
> To: user@drill.apache.org
> Cc: user-h...@drill.apache.org
> Subject: Re: WELCOME to user@drill.apache.org
>
> Hi Ahmad,
>
> Does the "show tables" command work for you from Hive shell ?
> Can you please share you Hive version that is being used ?
>
> And share the content from your Hive storage plugin, which is available on
> the storage tab on Drill web UI.
>
> Thanks,
> Khurram
>
> On Tue, May 14, 2019 at 10:07 AM Ahmad, Sami 
> wrote:
>
> > I am not able to see any hive or hbase tables in drill , I get the
> > schema fine , getting the error in the logfile shown below .
> >
> > Regards
> >
> >
> >
> > [root@hadoop1 bin]# ./drillbit.sh start Starting drillbit, logging to
> > /root/apache-drill-1.16.0/log/drillbit.out
> >
> >
> > [root@hadoop1 bin]#
> > [root@hadoop1 bin]# ./sqlline -u jdbc:drill:zk=hadoop1:2181 Apache
> > Drill 1.16.0 "Say hello to my little Drill."
> > apache drill> show schemas;
> > ++
> > |SCHEMA_NAME |
> > ++
> > | cp.default |
> > | dfs.default|
> > | dfs.root   |
> > | dfs.tmp|
> > | hbase  |
> > | hive.aaronm|
> > | hive.default   |
> > | hive.es_testdb |
> > | hive.guest |
> > | hive.ivann |
> > | hive.levic |
> > | hive.nasirq|
> > | hive.nipar |
> > | hive.rangertst |
> > | hive.richas|
> > | hive.roumig|
> > | hive.sami  |
> > | hive.sandeeps  |
> > | hive.sasmitap  |
> > | hive.sid   |
> > | hive.swaroopa  |
> > | hive.tito  |
> > | hive.usha  |
> > | information_schema |
> > | sys|
> > ++
> > 25 rows selected (1.869 seconds)
> > apache drill> use hive;
> > +--+--+
> > |  ok  | summary  |
> > +--+--+
> > | true | Default schema changed to [hive] |
> > +--+--+
> > 1 row selected (0.103 seconds)
> > apache drill (hive)> show tables;   <<<< NOT SHOWING
> > ANY HIVE TABLES.
> > +--++
> > | TABLE_SCHEMA | TABLE_NAME |
> > +--++
> > +--++
> > No rows selected (0.601 seconds)
> > apache drill (hive)>
> >
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:617)
> > [na:1.8.0_112]
> &g

Re: WELCOME to user@drill.apache.org

2019-05-14 Thread Khurram Faraaz
Hi Ahmad,

Does the "show tables" command work for you from Hive shell ?
Can you please share you Hive version that is being used ?

And share the content from your Hive storage plugin, which is available on
the storage tab on Drill web UI.

Thanks,
Khurram

On Tue, May 14, 2019 at 10:07 AM Ahmad, Sami 
wrote:

> I am not able to see any hive or hbase tables in drill , I get the schema
> fine , getting the error in the logfile shown below .
>
> Regards
>
>
>
> [root@hadoop1 bin]# ./drillbit.sh start
> Starting drillbit, logging to /root/apache-drill-1.16.0/log/drillbit.out
>
>
> [root@hadoop1 bin]#
> [root@hadoop1 bin]# ./sqlline -u jdbc:drill:zk=hadoop1:2181
> Apache Drill 1.16.0
> "Say hello to my little Drill."
> apache drill> show schemas;
> ++
> |SCHEMA_NAME |
> ++
> | cp.default |
> | dfs.default|
> | dfs.root   |
> | dfs.tmp|
> | hbase  |
> | hive.aaronm|
> | hive.default   |
> | hive.es_testdb |
> | hive.guest |
> | hive.ivann |
> | hive.levic |
> | hive.nasirq|
> | hive.nipar |
> | hive.rangertst |
> | hive.richas|
> | hive.roumig|
> | hive.sami  |
> | hive.sandeeps  |
> | hive.sasmitap  |
> | hive.sid   |
> | hive.swaroopa  |
> | hive.tito  |
> | hive.usha  |
> | information_schema |
> | sys|
> ++
> 25 rows selected (1.869 seconds)
> apache drill> use hive;
> +--+--+
> |  ok  | summary  |
> +--+--+
> | true | Default schema changed to [hive] |
> +--+--+
> 1 row selected (0.103 seconds)
> apache drill (hive)> show tables;    NOT SHOWING
> ANY HIVE TABLES.
> +--++
> | TABLE_SCHEMA | TABLE_NAME |
> +--++
> +--++
> No rows selected (0.601 seconds)
> apache drill (hive)>
>
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
> Caused by: java.util.concurrent.ExecutionException:
> MetaException(message:Got exception:
> org.apache.thrift.TApplicationException Invalid method name:
> 'get_tables_by_type')
> at
> org.apache.drill.shaded.guava.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:502)
> ~[drill-shaded-guava-23.0.jar:23.0]
> at
> org.apache.drill.shaded.guava.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:461)
> ~[drill-shaded-guava-23.0.jar:23.0]
> at
> org.apache.drill.shaded.guava.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
> ~[drill-shaded-guava-23.0.jar:23.0]
> at
> org.apache.drill.shaded.guava.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:142)
> ~[drill-shaded-guava-23.0.jar:23.0]
> at
> org.apache.drill.shaded.guava.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2453)
> ~[drill-shaded-guava-23.0.jar:23.0]
> at
> org.apache.drill.shaded.guava.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2417)
> ~[drill-shaded-guava-23.0.jar:23.0]
> at o
>
>
> -Original Message-
> From: user-h...@drill.apache.org [mailto:user-h...@drill.apache.org]
> Sent: Tuesday, May 14, 2019 10:55 AM
> To: Ahmad, Sami 
> Subject: WELCOME to user@drill.apache.org
>
> EXTERNAL SENDER: Use caution with links and attachments.
>
>
> Hi! This is the ezmlm program. I'm managing the user@drill.apache.org
> mailing list.
>
> I'm working for my owner, who can be reached at
> user-ow...@drill.apache.org.
>
> Acknowledgment: I have added the address
>
>sami.ah...@dot.state.fl.us
>
> to the user mailing list.
>
> Welcome to user@drill.apache.org!
>
> Please save this message so that you know the address you are subscribed
> under, in case you later want to unsubscribe or change your subscription
> address.
>
>
> --- Administrative commands for the user list ---
>
> I can handle administrative requests automatically. Please do not send
> them to the list address! Instead, send your message to the correct command
> address:
>
> To subscribe to the list, send a message to:
>
>
> To remove your address from the list, send a message to:
>
>
> Send mail to the following for info and FAQ for this list:
>
>
>
> Similar addresses exist for the digest list:
>
>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>
>
> To get an index with subject and author for messages 123-456 , mail:
>
>
> They are always returned as sets of 100, max 2000 per request, so you'll
> actually get 100-499.
>
> To receive all messages with the 

Re: help needed

2019-05-09 Thread Khurram Faraaz
Hi Ahmad,

You will need a storage plugin for HBase and one more for Hive, please
refer to the doc links below for more info. Please share more details about
any errors that might see.

https://drill.apache.org/docs/hbase-storage-plugin/

https://drill.apache.org/docs/hive-storage-plugin/

Thanks,
Khurram

On Thu, May 9, 2019 at 2:28 PM Ahmad, Sami 
wrote:

> hi
>
> I need help in apache drill and Hbase and hive  , as I could not get drill
> to work with either , what are my options ?
>
> Regards
> Sami
>


Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-24 Thread Khurram Faraaz
i see the correct version and commit, I deployed the binaries to test.

Apache Drill 1.16.0
"Start your SQL engine."
apache drill> select * from sys.version;
+-+--+-+---+---+---+
| version |commit_id |
 commit_message|commit_time|
build_email|build_time |
+-+--+-+---+---+---+
| 1.16.0  | cf5b758e0a4c22b75bfb02ac2653ff09415ddf53 |
[maven-release-plugin] prepare release drill-1.16.0 | 22.04.2019 @ 09:08:36
PDT | sor...@apache.org | 22.04.2019 @ 09:53:25 PDT |
+-+--+-+---+---+---+
1 row selected (0.274 seconds)
apache drill>

Thanks,
Khurram

On Wed, Apr 24, 2019 at 9:52 AM SorabhApache  wrote:

> Hi Volodymyr/Anton,
> I can verify that I am seeing both the below issues as reported by Anton
> and Volodymyr. I will investigate further why we are seeing these issues.
> Thanks for catching this. Can you please open JIRA's for these issues ?
>
> 1) Wrong result for sys.version query when built from source tarball.
> 2) git.properties file in drill-format-mapr-1.16.0.jar has wrong commit id
> but as Volodymyr mentioned because of order in which jars are picked up
> it's not showing the issue when tried from prebuilt tarball.
>
> @Volodymyr Vysotskyi 
> Regarding the GPG key I am not sure if we mandate it to use apache.org,
> there other keys in the file which are using gmail address as well. As far
> as the signing person is authenticating the key and details associated with
> it, I think it should be fine. But since it's recommended I will use
> apache.org email address instead.
>
> Thanks,
> Sorabh
>
> On Wed, Apr 24, 2019 at 8:53 AM Volodymyr Vysotskyi 
> wrote:
>
> > Hi Aman,
> >
> > There are two different issues connected with *git.properties* file.
> > Regarding the problem I have mentioned, prebuilt tar
> > (apache-drill-1.16.0.tar.gz) contains *drill-format-mapr-1.16.0.jar* jar
> > which contains a *git.properties* file with the incorrect version.
> > When *select * from sys.version* query is submitted, class loader finds
> the
> > first file named as *git.properties* from the classpath (each drill jar
> > contains its own *git.properties* file) and for my case file from
> > *drill-format-mapr-1.16.0.jar *is picked up, so the incorrect result is
> > returned. But it may not be reproducible for other machines since it
> > depends on the order of files for the class loader.
> >
> > Regarding the problem Anton has mentioned, Drill should be built from the
> > sources (apache-drill-1.16.0-src.tar.gz), and for that version, *select *
> > from sys.version* returns the result without information about commit.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Wed, Apr 24, 2019 at 6:33 PM Aman Sinha  wrote:
> >
> > > This works fine for me with the binary tarball that I installed on my
> > Mac.
> > > ..it shows the correct commit message.
> > >
> > > Apache Drill 1.16.0
> > >
> > > "This isn't your grandfather's SQL."
> > >
> > > apache drill> *select* * *from* sys.version;
> > >
> > >
> >
> +-+--+-+---+---+---+
> > > | version |commit_id |
> > >  commit_message|commit_time|
> > > build_email|build_time |
> > >
> > >
> >
> +-+--+-+---+---+---+
> > > | 1.16.0  | cf5b758e0a4c22b75bfb02ac2653ff09415ddf53 |
> > > [maven-release-plugin] prepare release drill-1.16.0 | 22.04.2019 @
> > 09:08:36
> > > PDT | sor...@apache.org | 22.04.2019 @ 09:54:09 PDT |
> > >
> > >
> >
> +-+--+-+---+---+---+
> > >
> > > I don't see any extraneous git.properties anywhere in the source
> > > distribution that I downloaded:
> > >
> > > [root@aman1 apache-drill-1.16.0-src]# find . -name "git.properties"
> > >
> > >
> > >
> >
> ./distribution/target/apache-drill-1.16.0/apache-drill-1.16.0/git.properties
> > >
> > > ./git.properties
> > >
> > >
> > >
> > > On Wed, Apr 24, 2019 at 4:51 AM Arina Ielchiieva 
> > wrote:
> > >
> > > > Taking into account previous emails, looks like we'll need to have
> 

Re: RESOURCE ERROR: External Sort encountered an error while spilling to disk

2019-03-12 Thread Khurram Faraaz
Can you also share the stack trace from drillbit.log for the below error
RESOURCE ERROR: External Sort encountered an error while spilling to disk

And what is the version of Drill that you are running ?
Please share the table definition and the number of rows in that table,
which is being queried.

Thanks,
Khurram

On Tue, Mar 12, 2019 at 12:17 PM Boaz Ben-Zvi  wrote:

>   Hi Giovanni,
>
>  The error given by the External-Sort indicates a problem while
> spilling the excess memory into disk.
>
> When you enlarged the memory (from the default 2GB) to 8GB the LAG query
> may succeeded without spilling, hence circumvented the issue.
>
> Yes, you can keep enlarging the memory and run w/o spilling, but better
> check and fix the root issue.
>
> How is your spilling configured - check the filesystem -
> "drill.exec.spill.fs" and the directories - "drill.exec.spill.directories"
>
> The default is the local filesystem, and into /tmp .  It is possible
> that very little disk space is available for /tmp .
>
>   Thanks,
>
>   -- Boaz
>
> On 3/12/19 6:03 AM, Giovanni Conte wrote:
> > Hello,
> > I am doing a LAG and a SUM query over a PCAP dataset of 300Mb.
> > I get this error:
> > RESOURCE ERROR: External Sort encountered an error while spilling to disk
> >
> > Then I changetd
> > planner.memory.max_query_memory_per_node ---> 8589934592
> > and with this I am able to perform the LAG but not the SUM.
> >
> > I have no problem of memory since I am working on a server with 72 cores
> > and 256 GB of RAM.
> > Which other parameter should I change to avoid the RESOURCE ERROR?
> > Can I go over 8589934592?
> > Thank you very much,
> >
> > Giovanni
> >
>


Re: Problem Creating Views In Apache Drill

2019-01-11 Thread Khurram Faraaz
Hi Peter,

What version of Drill are you on ?

Thanks,
Khurram

On Fri, Jan 11, 2019 at 12:20 PM Gautam Parai  wrote:

> Hi Peter,
>
> Do you see the same error when you run the query instead?
>
> SELECT id, settlement,  settlement_type,
> to_timestamp(to_char(settlement_date, '#')) as settlement_date
> ,transaction_id, transaction_type, settlement_institution,
> settlement_institution_bank_domain_id, domain_i,product_group_id,
> channel_id FROM `dfs`.`***`.`/tbl_settlement`
>
> Could you narrow down on the column which is causing this error? Also, can
> you please set alter session set `exec.errors.verbose` = true and provide
> the stack trace from the console.
>
> Thanks,
> Gautam
>
> On Fri, Jan 11, 2019 at 3:22 AM Peter Edike <
> peter.ed...@interswitchgroup.com> wrote:
>
> > Hello Everyone,
> >
> >
> > I have a set of parquet files which where created as a result of a batch
> > ETL job (using Sqoop 1).
> >
> >
> > I can query the fields normally without any kind of formatting and all is
> > well...
> >
> >
> > However, when i try to create a view with the following code
> >
> >
> > CREATE VIEW `dfs.asra`.`vw_tbl_settlement` AS
> > SELECT id, settlement,  settlement_type,
> > to_timestamp(to_char(settlement_date, '#')) as settlement_date
> > ,transaction_id, transaction_type, settlement_institution,
> > settlement_institution_bank_domain_id, domain_i,product_group_id,
> > channel_id FROM `dfs`.`***`.`/tbl_settlement`
> >
> > I end up with the following exception
> >
> > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > SchemaChangeException: Failure while trying to materialize incoming
> schema.
> > Errors: Error in expression at index -1. Error: Missing function
> > implementation: [castTINYINT(VARCHAR-OPTIONAL)]. Full expression:
> --UNKNOWN
> > EXPRESSION--.. Fragment 0:0 [Error Id:
> 55c13096-085a-4d0c-9eca-5c7bbc2d1b8f
> > on bdnode3.interswitch.com:31010]?
> >
> > Please What am I doing wrong
> >
> > Kind Regards
> >
> >
> >
>


Re: Drill connection using jdbc

2019-01-08 Thread Khurram Faraaz
Hi Tushar,

This example works just fine for me, you may want to use the template

import org.apache.log4j.Logger;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.Types;
import java.sql.*;
import java.io.*;
import java.lang.*;
import java.util.*;
import java.util.Scanner;

public class TestQuery {

public static void main(String s[]) throws Exception {

Connection conn = null;
PreparedStatement prdstmnt = null;
try {
// Please replace test1302-145 with your hostname, name of host
where Drillbit is up and running
final String URL_STRING =
"jdbc:drill:schema=dfs.tmp;drillbit=test1302-145";
Class.forName("org.apache.drill.jdbc.Driver").newInstance();

// Please replace username and password with your username and
password
conn = DriverManager.getConnection(URL_STRING,"mapr","mapr");
prdstmnt = conn.prepareStatement("SELECT * FROM
cp.`employee.json`");
prdstmnt.executeQuery();
prdstmnt.close();
} catch (Exception e) {
   System.out.println(e.getMessage());
   e.printStackTrace();
} finally {
if(conn != null)
try { conn.close(); } catch(SQLException sqlexcpn){
sqlexcpn.printStackTrace(); }
if(prdstmnt != null)
try { prdstmnt.close(); } catch(SQLException sqlexcpn){
sqlexcpn.printStackTrace(); }
}
}
}

Regards,
Khurram

On Tue, Jan 8, 2019 at 11:10 PM Tushar Pathare  wrote:

> Hello Team,
>
> We have installed drill 1.12.0 and trying to connect
> using a client to the drill cluster
>
> Our config for connection is
>
>
>
> Drill class is : org.apache.drill.jdbc.Driver
>
> Driver location is
> /Users/tpathare/Downloads/apache-drill-1.12.0/jars/jdbc-driver/drill-jdbc-all-1.12.0.jar
>
> Jdbc url is : jdbc:drill:zk=172.32.63.64:2181,172.32.63.65:2181,
> 172.32.63.67:2181
>
>
>
> When I hit connect I see the following error.Though the cluster is up and
> running and ZK is also good.
>
>
>
> *ERROR: An error occurred while trying to make a connection to *
>
> *the database:  *
>
>
>
> *JDBC URL: *
>
> *jdbc:drill:zk=172.32.63.64:2181
> ,172.32.63.65:2181
> ,172.32.63.67:2181/sidra-drillbit
>  *
>
>
>
> *Failure in connecting to Drill: *
>
> *oadd.org.apache.drill.exec.rpc.RpcException: Failure setting up *
>
> *ZK for client.*
>
>
>
>
>
>
>
> [image: cid:image001.png@01D3B9E9.23323D90]
>
> *Tushar Bhalchandra Pathare*
>
> Bigdata,GPFS,CSXF – Research Department
>
> *t. *+974 4003 7443
>
> *m.* +974 74793547
>
> PO BOX 26999 Doha, Qatar
>
> sidra.org
> 
>
>
> Disclaimer: This email and its attachments may be confidential and are
> intended solely for the use of the individual to whom it is addressed. If
> you are not the intended recipient, any reading, printing, storage,
> disclosure, copying or any other action taken in respect of this e-mail is
> prohibited and may be unlawful. If you are not the intended recipient,
> please notify the sender immediately by using the reply function and then
> permanently delete what you have received. Any views or opinions expressed
> are solely those of the author and do not necessarily represent those of
> Sidra Medical and Research Center.
>


Re: Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.

2018-12-08 Thread Khurram Faraaz
Try using port number 5181 instead of port 2181.

Regards,
Khurram

On Fri, Dec 7, 2018 at 4:19 PM Alaa Salmo  wrote:

> Hi All,
>
> I am trying to install Hadoop/Zookeeper with Apache Drill (distributed
> mode). I am facing the following error
> "org.apache.drill.exec.exception.DrillbitStartupException:
> Drillbit is disallowed to bind to loopback address in distributed mode.". I
> tried to google, the answer was change  drill-override.conf. I am still
> facing the same problem. Did anybody has same issue? Does Drill work in the
> local machine with Zookeeper?
>
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "mgelbana-machine:2181"
> }
>
> OR
>
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "127.0.0.1:2181"
> }
>
>
>
> On Sat, Dec 1, 2018 at 9:09 AM Alaa Salmo  wrote:
>
> > Hi,
> >
> > I am installing Drill on my virtual machine (Centos). I am facing one
> > issue. I tried to google but I could not solve it
> >
> >  bin/drillbit.sh start
> >  cat /usr/local/apache-drill-1.14.0/log/drillbit.out
> >
> >
> ***
> > Exception in thread "main"
> > org.apache.drill.exec.exception.DrillbitStartupException: Failure during
> > initial startup of Drillbit.
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:428)
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:394)
> > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:390)
> > Caused by: org.apache.drill.exec.exception.DrillbitStartupException:
> > Drillbit is disallowed to bind to loopback address in distributed mode.
> > at
> > org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:97)
> > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:191)
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:424)
> >
> >
> ***
> >
> > - I tried to chnage the hostname but I still facing same issue
> > - I tried to change Java 8 JDK from OpenJDK to Oracle but I still facing
> > same issue
> >
> > Please can you provide me the right answer?
> >
> > Regards,
> > Alaa
> >
> >
>


Re: [ANNOUNCE] New Committer: Karthikeyan Manivannan

2018-12-07 Thread Khurram Faraaz
Congratulations Karthik!!

On Fri, Dec 7, 2018 at 11:12 AM Abhishek Girish  wrote:

> Congratulations Karthik!
>
> On Fri, Dec 7, 2018 at 11:11 AM Arina Ielchiieva  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> > Karthikeyan
> > Manivannan to become a committer, and we are pleased to announce that he
> > has accepted.
> >
> > Karthik started contributing to the Drill project in 2016. He has
> > implemented changes in various Drill areas, including batch sizing,
> > security, code-gen, C++ part. One of his latest improvements is  ACL
> > support for Drill ZK nodes.
> >
> > Welcome Karthik, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>


Re: Join tables from different databases returns empty result

2018-11-06 Thread Khurram Faraaz
Hi,

You will have to create a new storage plugin from Drill's Web UI, you can
find the details on the below documentation link, to query MySQL tables
from Apache Drill.
https://drill.apache.org/docs/rdbms-storage-plugin/

Also please share the query plan if you can generate one using EXPLAIN FOR
command, for that join query from Drill.

Thanks,
Khurram

On Tue, Nov 6, 2018 at 3:58 PM Prisdha Dharma 
wrote:

> Hi,
>
> I'm using Apache Drill to join 2 tables from different databases on MySql
> server
> When running this query:
> SELECT a.name, b.url
> FROM pg_authservice.partner a
> INNER JOIN pg-edge.graph_apis b
> N a.id = b.pid
> WHEN a.status = 1 && b.job_status = “COMPLETED”;
>
> I see that Drill returns empty result, even though the join using MySQL
> returns non-empty data.
> Could you please advise.
>
> Prisdha Dharma
>
>
>


Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Khurram Faraaz
Congratulations Hanu!

On Thu, Nov 1, 2018 at 10:14 AM Gautam Parai  wrote:

> Congratulations Hanumath! Well deserved :)
>
> Gautam
>
> On Thu, Nov 1, 2018 at 9:44 AM AnilKumar B  wrote:
>
> > Congratulations Hanumath.
> >
> > Thanks & Regards,
> > B Anil Kumar.
> >
> >
> > On Thu, Nov 1, 2018 at 9:39 AM Vitalii Diravka 
> wrote:
> >
> > > Congratulations!
> > >
> > > Kind regards
> > > Vitalii
> > >
> > >
> > > On Thu, Nov 1, 2018 at 5:43 PM salim achouche 
> > > wrote:
> > >
> > > > Congrats Hanu!
> > > >
> > > > On Thu, Nov 1, 2018 at 6:05 AM Arina Ielchiieva 
> > > wrote:
> > > >
> > > > > The Project Management Committee (PMC) for Apache Drill has invited
> > > > > Hanumath
> > > > > Rao Maduri to become a committer, and we are pleased to announce
> that
> > > he
> > > > > has accepted.
> > > > >
> > > > > Hanumath became a contributor in 2017, making changes mostly in the
> > > Drill
> > > > > planning side, including lateral / unnest support. He is also one
> of
> > > the
> > > > > contributors of index based planning and execution support.
> > > > >
> > > > > Welcome Hanumath, and thank you for your contributions!
> > > > >
> > > > > - Arina
> > > > > (on behalf of Drill PMC)
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Salim
> > > >
> > >
> >
>


Re: Hbase tables in apache drill not showing up

2018-10-29 Thread Khurram Faraaz
Hi,

What is the version of Drill and HBase that you are using ?, Do you see any
errors in drillbit.log file ?

You will have to enable the HBase storage plugin on Drill web UI, and
ensure it is enabled

 {
  "type": "hbase",
  "config": {
"hbase.zookeeper.quorum": "1.2.3.12",
"hbase.zookeeper.property.clientPort": "2181"
  },
  "size.calculator.enabled": false,
  "enabled": true
}


The to query HBase tables you will have to specify the schema in sqlline,
like

sqlline> use hbase;
sqlline> SELECT * FROM ;

Thanks,
Khurram

On Mon, Oct 29, 2018 at 9:46 AM Schwendimann Thomas I.BSCWI.1701 <
thomas.schwendim...@stud.hslu.ch> wrote:

> Dear Sir or Madam,
>
> I am trying to use apache drill to query an hbase table. I uploaded a
> table "users" through pig. When checking through the hbase shell, I can
> confirm that the table "users" is there and filled with my data. However,
> on apache drill there is no hbase table when using the query "show
> databases". All my hive and mongoDB tables show up however.
>
> These are my configurations:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__i.imgur.com_147RcUz.png=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=PgdrMUUqChHV8C7nJXcgmZkgP5bOp-e5FzhToQSnWEM=7c-Sdd-CwRsDuo5TCE4PyH8LPHquQ5wMhGScGbk8wgw=
>
> If you can help me out here it's greatly appreciated.
>
> Thanks in advance.
>
> Thomas
>


Re: Query hung in Cancelled_Requested state

2018-10-26 Thread Khurram Faraaz
Hi Divya,

Please see link below for more info, I have added details of how to Cancel
a query using the REST API
https://drill.apache.org/docs/rest-api-introduction/

GET /profiles/cancel/{queryid}
Cancel the query that has the given queryid.

Parameter

queryid--The UUID of the query in standard UUID format that Drill assigns
to each query.

Example

curl
http://localhost:8047/profiles/cancel/29b2e988-35e7-4c85-3151-32c7d3347f15

Thanks,
Khurram

On Thu, Oct 25, 2018 at 11:50 PM Divya Gehlot 
wrote:

> Hi ,
> The query hanging in Cancellation_Request is one of the bug on Drill 1.10
> and this issue has been resolved  post Drill 1.10 .
> That's why I am asking is there any way apart from Web UI where I can kill
> the query using the query id ?
> The queries that I am trying to kill are running for days .
>
> Thanks,
> Divya
>
> On Fri, 26 Oct 2018 at 14:04, Sorabh Hamirwasia 
> wrote:
>
> > Hi Divya,
> > If possible can you also try running same query on master ? There have
> been
> > lots of improvement/bug fixes in cancellation code path to resolve hang
> > situations post 1.10.
> >
> > Thanks,
> > Sorabh
> >
> > On Thu, Oct 25, 2018 at 10:49 PM Kunal Khatua  wrote:
> >
> > > What is the Drillbit's JStack? You can try seeing the /threads webpage.
> > >
> > > On 10/25/2018 8:16:06 PM, Divya Gehlot 
> wrote:
> > > Hi,
> > > I am using Dril 1.10 and I have query running for days and when I click
> > on
> > > cacel the query through Drill Web UI . It hung in
> Cancellation_Requested
> > > state .
> > > Is there any way to cancel the query using query id though terminal ?
> > > Appreciate your help !
> > >
> > > Thanks,
> > > Divya
> > >
> >
>


Re: Failed to fetch parquet metadata after 15000ms

2018-10-08 Thread Khurram Faraaz
Hi Karthik,

You can try setting the session/system option, exec.queue.timeout_millis to
a higher value, the default is 30

Thanks,
Khurram

On Mon, Oct 8, 2018 at 10:40 AM karthik.R  wrote:

> Hi,
> I am frequently getting below exception when running a query in drill 1.14.
> Could you please help what option to set to increase this timeout?
>
> Waited for 15000ms , but tasks for 'Fetch parquet metadata' are not
> complete. Total runnable size 4 , parallelism 4
>
>
> My parquet files present in s3 location with 5 parquet file partitions.
>
> Please help
>


Re: Kafka Plugin in Drill 1.14

2018-10-04 Thread Khurram Faraaz
Hi,

You can find details here -
https://drill.apache.org/docs/kafka-storage-plugin/

When you install Drill, a preconfigured Kafka storage plugin is available
on the Storage page in the Drill Web Console. Once you enable and configure
the storage plugin, you can query Kafka from Drill.

Thanks,
Khurram

On Wed, Oct 3, 2018 at 7:25 PM Divya Gehlot  wrote:

> Hi,
> I installed Drill in embedded mode and I don't see Kafka plugin under
> Storage plugin in Web UI  . Do I need to create or its available by
> default?
>
> Thanks,
> Divya
>


Re: to_date() string to date conversion ERROR

2018-10-03 Thread Khurram Faraaz
Hello Herman,

That information about setting the timezone is available here in the Apache
Drill docs
https://drill.apache.org/docs/data-type-conversion/

Configure the default time zone format in /conf/drill-env.sh by adding
-Duser.timezone=UTC to DRILL_JAVA_OPTS.
For example:

export DRILL_JAVA_OPTS="-Xms1G -Xmx$DRILL_MAX_HEAP
-XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=1G -ea -Duser.timezone=UTC"

Regards,
Khurram

On Wed, Oct 3, 2018 at 8:44 AM Herman Tan  wrote:

> By the way,
> I was looking for how to set user.timezone option.
>
> https://mapr.com/support/s/article/Understanding-Drill-s-timestamp-and-timezone?language=en_US
> It says sys.boot option user.timezone, set to UTC
> It does not exist anymore
>
>
> https://mapr.com/support/s/article/MapR-DB-JSON-application-Error-Illegal-instant-due-to-time-zone-offset-transition?language=en_US
> I followed the instructions here.
> Add -Duser.timezone=UTC in DRILL_JAVA_OPTS sqlline.bat
>
>
> Regards,
> Herman
>
>
>
> On Wed, Oct 3, 2018 at 11:34 PM Herman Tan  wrote:
>
> > Hi,
> >
> > I ran SQL
> > SELECT TIMEOFDAY() FROM (VALUES(1));
> > and I got:
> > 2018-10-03 15:27:16.586 Asia/Singapore
> >
> > I added -Duser.timezone=UTC in DRILL_JAVA_OPTS  in the sqlline.bat
> > Restarted drill
> >
> > I ran SQL again
> > SELECT TIMEOFDAY() FROM (VALUES(1));
> > and I got:
> > 2018-10-03 15:27:16.586 UTC
> >
> > I ran SQL that failed before:
> > select to_date('1982/01/01 00:00:00.0','/MM/dd
> > HH:mm:ss.S') -- fail
> > from (values(1))
> > and it passes!
> >
> > Thank you for the help.
> >
> > Herman
> >
> >
> > On Wed, Oct 3, 2018 at 5:35 PM Vova Vysotskyi  wrote:
> >
> >> Hello Herman,
> >>
> >> I tried to reproduce this error, but all queries passed on my machine.
> >> Could you please add more details about your env? Which version of Drill
> >> is
> >> used, which timezone is set?
> >> Is it reproduced with UTC timezone?
> >>
> >> Kind regards,
> >> Volodymyr Vysotskyi
> >>
> >>
> >> On Mon, Oct 1, 2018 at 10:58 AM Herman Tan 
> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have a very puzzling error.
> >> > Try the following SQL statements.
> >> >
> >> > What is the problem with '1982/01/01 00:01:00.0'?
> >> > Error message: Illegal instant due to time zone offset transition
> >> >
> >> > select to_date('1981/12/31 00:00:00.0','/MM/dd
> >> > HH:mm:ss.S') -- pass
> >> > from (values(1))
> >> >
> >> > select to_date('1981/12/31 11:59:59.0','/MM/dd
> >> > HH:mm:ss.S') -- pass
> >> > from (values(1))
> >> >
> >> > select to_date('1982/01/01 00:00:00.0','/MM/dd
> >> > HH:mm:ss.S') -- fail
> >> > from (values(1))
> >> >
> >> > select to_date('1982/01/01 00:00:01.0','/MM/dd
> >> > HH:mm:ss.S') -- fail
> >> > from (values(1))
> >> >
> >> > select to_date('1982/01/01 00:01:00.0','/MM/dd
> >> > HH:mm:ss.S') -- fail
> >> > from (values(1))
> >> >
> >> > select to_date('1982/01/01 01:00:00.0','/MM/dd
> >> > HH:mm:ss.S') -- pass
> >> > from (values(1))
> >> >
> >> > select to_date('1982/01/02 00:00:00.0','/MM/dd
> >> > HH:mm:ss.S') -- pass
> >> > from (values(1))
> >> >
> >> > select to_date('1983/01/01 00:00:00.0','/MM/dd
> >> > HH:mm:ss.S') -- pass
> >> > from (values(1))
> >> >
> >> > Herman
> >> >
> >>
> >
>


Re: S3 Storage Plugin - IAM roles support

2018-09-17 Thread Khurram Faraaz
Hi Nitin,

You can read the details from here -
http://drill-user.incubator.apache.narkive.com/mP5GE7fq/s3-using-iam-roles

A drill user hit the same issue in the past and he has documented the steps
for others to benefit. Hope that helps.

Regards,
Khurram

On Mon, Sep 17, 2018 at 2:20 PM, Nitin S  wrote:

> Hello,
>
> I am trying to setup drill in distributed mode and access some parquet
> files in S3. The instructions here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
> apache.org_docs_s3-2Dstorage-2Dplugin_=DwIBaQ=
> cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=
> I3BfnYvwCJajF5EHuHVLb6Pzlb1C_56Em6a6kgxv1Ac=
> hQMUN2MEoN9AGwiPvEjbZR7mnh37WG9x2XWhPPm3ky8=  do not provide any info
> on what to do if we are using IAM roles for S3 (instead of static keys)
>
> Kindly advise how to set that up.
>
> Thanks,
> Nitin
>


Re: Failure while reading messages from kafka

2018-09-04 Thread Khurram Faraaz
Can you please share the stack trace from drillbit.log and the version of
Kafka that you are on ?

Thanks,
Khurram

On Tue, Sep 4, 2018 at 11:39 AM, Matt  wrote:

> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_DRILL-2D6723=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=1BtkhAfyGAwpn_Qov_2S8-
> 0asOGHNYFYaxumWdyI4kE=57Qb0eMSL8xLEok5uB7ovkEkro0DgQwTXYnFeERXGiY=
>
> On Mon, Aug 27, 2018 at 12:27 PM Matt  wrote:
>
> > I have a Kafka topic with some non-JSON test messages in it, resulting in
> > errors "Error: DATA_READ ERROR: Failure while reading messages from
> kafka.
> > Recordreader was at record: 1"
> >
> > I don't seem to be able to bypass these topic messages with
> > "store.json.reader.skip_invalid_records" or even an OFFSET in the query.
> >
> > Is there a mechanism or setting I can use to query a topic and not fail
> on
> > malformed messages?
> >
>


Re: Cannot connect over JDBC from LibreOffice Base to Drill (embedded)

2018-09-01 Thread Khurram Faraaz
Hello Randy,

Do you use a username/password to connect to Drill, in your JDBC connection
string ?
What do the Drill logs say, the drillbit.log and drillbit.out files under
logs directory ?
Can you connect to Drill from sqlline prompt ?

Thanks,
Khurram

On Sat, Sep 1, 2018 at 1:19 PM, Randy Simons  wrote:

> Hello,
>
> I'm trying to get LibreOffice Base to connect over JDBC to Drill, which I
> have
> running as embedded on the local machine. Unfortunately, when trying to
> connect, it fails with "Failure in creating DrillConnectionImpl:
> java.lang.NullPointerException".
>
> I found this question on SO, which seems identical:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__
> stackoverflow.com_=DwICAg=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-
> mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=YEJu_ibvhyCnvvLOw9hTA7dYUigwgGEpm_
> e9_mWWyhg=eE9Uw7MSmfpsBlcivEylkGCc9u_LYJcmBSckD6-UGxI=
> questions/33196668/interfacing-apache-drill-vs-openoffice-libreoffice-via-
> jdbc
> but unfortunately no real answer there. And just like in this question, I
> have
> no problem connecting to Drill over JDBC from SQuireL SQL Client. It lets
> me
> run queries just fine.
>
> Software:
>
> * Drill 1.14, unmodified standard configuration
> * LibreOffice 6.0.3.2 (current version of Ubuntu 18.04)
> * Kubuntu 18.04
> * OpenJDK 1.8.0_181 (Since Drill has issues with Java 9+, I switched from
> the
> default Java 10 to Java 8, using update-java-alternatives)
> * I've added apache-drill-1.14.0/jars/jdbc-driver/drill-jdbc-all-1.14.0.jar
> to
> LibreOffice's class path.
> * LibreOffice is configured to use JRE 1.8.0_181 (instead of JRE10)
>
> Steps:
>
> * Run drill embedded, via bin/drill-embedded.
> * Create a new LibreOffice Base database.
> * In the Database Wizard, connect to existing database (JDBC)
> * Datasource URL: jdbc:drill:drillbit=localhost
> * JDBC driver class: org.apache.drill.jdbc.Driver
>
> "Test class" now confirms the driver can be loaded, and the class is
> found.
> However, "Test Connection" yields: "Failure in creating
> DrillConnectionImpl:
> java.lang.NullPointerException". I couldn't obtain a stack trace from
> LibreOffice for further info.
>
> The exception is thrown from within the Drill JDBC driver, but it could
> very
> well be Base which doesn't behave correctly, leading to this problem.
>
> Anything else I can try?
>
> TIA,
>
> Randy
>
>
>


Re: Apache Drill Queries Take so Long after which Webconsole dies and I don't see results

2018-08-01 Thread Khurram Faraaz
Hi,

1.  What version of Drill are you running your query on ?
2.  How were your parquet files created ? (were they created on Drill or
another system, like Hive etc.)
3.  Can you please share the parquet schema of the parquet file being
queried ?

Thanks,
Khurram

On Wed, Aug 1, 2018 at 3:24 AM, Peter Edike <
peter.ed...@interswitchgroup.com> wrote:

> Hello everyone,
>
> I am currently using apache drill to query a directory of parquet files
> (total size 64gb with each file averaging 6.4Gb per size).  I am running
> the following query
>
> select * from dfs.rj.vw_reports_201805 where unique_key in
> (
>  select unique_key from dfs.rj.vw_reports_201805 group by
> unique_key having count(unique_key) > 1
> )
> limit 40
>
> Now this query takes about 42 minutes after which it fails because one of
> the drillbits in the drillbit cluster crashes. I check the logs and I see
> several errors summarized as follows
>
> 20:35:18.006 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState
> - Connection timed out for connection string (172.35.15.129:5181,
> 172.35.15.151:5181,172.35.15.152:5181) and timeout (5000) / elapsed
> (13347)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
> [curator-client-2.7.1.jar:na]
> at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
> [curator-client-2.7.1.jar:na]
> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(
> CuratorZookeeperClient.java:115) [curator-client-2.7.1.jar:na]
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:806)
> [curator-framework-2.7.1.jar:na]
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
> [curator-framework-2.7.1.jar:na]
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> access$300(CuratorFrameworkImpl.java:62) [curator-framework-2.7.1.jar:na]
> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> call(CuratorFrameworkImpl.java:257) [curator-framework-2.7.1.jar:na]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_171]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_171]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_171]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> AND
>
>
> Aug 1, 2018 7:54:18 AM WARNING: org.apache.parquet.CorruptStatistics:
> Ignoring statistics because created_by could not be parsed (see
> PARQUET-251): parquet-mr (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb)
>
> org.apache.parquet.VersionParser$VersionParseException: Could not parse
> created_by: parquet-mr (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb)
> using format: (.+) version ((.*) )?\(build ?(.*)\)
>
> at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
>
> at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(
> CorruptStatistics.java:66)
>
> at org.apache.parquet.format.converter.ParquetMetadataConverter.
> fromParquetStatistics(ParquetMetadataConverter.java:264)
>
> at org.apache.parquet.format.converter.ParquetMetadataConverter.
> fromParquetMetadata(ParquetMetadataConverter.java:568)
>
> at org.apache.parquet.format.converter.ParquetMetadataConverter.
> readParquetMetadata(ParquetMetadataConverter.java:545)
>
> at org.apache.parquet.hadoop.ParquetFileReader.readFooter(
> ParquetFileReader.java:455)
>
> at org.apache.parquet.hadoop.ParquetFileReader.readFooter(
> ParquetFileReader.java:404)
>
> at org.apache.parquet.hadoop.ParquetFileReader.readFooter(
> ParquetFileReader.java:390)
>
> at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.
> getBatch(ParquetScanBatchCreator.java:117)
>
> at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.
> getBatch(ParquetScanBatchCreator.java:53)
>
> at org.apache.drill.exec.physical.impl.ImplCreator$2.
> run(ImplCreator.java:146)
>
> at org.apache.drill.exec.physical.impl.ImplCreator$2.
> run(ImplCreator.java:142)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1633)
>
> at org.apache.drill.exec.physical.impl.ImplCreator.
> getRecordBatch(ImplCreator.java:142)
>
> at org.apache.drill.exec.physical.impl.ImplCreator.
> getChildren(ImplCreator.java:182)
>
> at org.apache.drill.exec.physical.impl.ImplCreator.
> getRootExec(ImplCreator.java:110)
>
> at org.apache.drill.exec.physical.impl.ImplCreator.
> 

Re: Best Practice to check Drillbit status(Cluster mode)

2018-07-10 Thread Khurram Faraaz
You can run the below query, and look for the *state *column in the result
of the query. Online drillbits will be marked as ONLINE.

select * from sys.drillbits;

- Khurram

On Tue, Jul 10, 2018 at 12:24 AM, Divya Gehlot 
wrote:

> Hi,
> I would like to know the best practice to check the Drillbits status in
> cluster mode.
> I have encountered the scenario when check Drillbits process running fine
> and When check in Drll WebUI , some of the Drillbits are down.
> When do RCA(root cause analysis) , got to know due to some reason drillbits
> process hanged .
> For now the alert system which I have implemented now is checking the
>
>
> > drill/bin/drillbit.sh status
>
>
> Is there any other best way to catch the hung Drillbit process?
> Appreciate the advise from Drill community users.
>
> Thanks,
> Divya
>


Re: Drill error

2018-06-27 Thread Khurram Faraaz
Hi Nitin,

Can you please share the description of the table (i.e. column types) ?
Is this a parquet table or JSON ?
Also please share the version of Drill and the drillbit.log

Thanks,
Khurram

On Wed, Jun 27, 2018 at 9:45 AM, Nitin Pawar 
wrote:

> here are the details
>
> query:
>
> select Account Account,Name Name,CONCAT(DateString , string2)
> Merged_String from dfs.tmp.`/nitin/` t1
>
> There is no custom UDF in this query;
>
>
>
> On Wed, Jun 27, 2018 at 2:22 PM, Nitin Pawar 
> wrote:
>
> > Hi Vitalii,
> >
> > Thanks for the description.
> > I will try to get the log for this but as this is happening in production
> > and drill log gets overwritten in our case very fast.
> >
> > Also I will get the query for the same while getting the log
> >
> > I will get the details and update the thread
> >
> > Thanks,
> > Nitin
> >
> > On Tue, Jun 26, 2018 at 7:32 PM, Vitalii Diravka <
> > vitalii.dira...@gmail.com> wrote:
> >
> >> Hi Nitin,
> >>
> >> It happens in the process of reallocation of the size of buffers in the
> >> memory.
> >> It isn't a User Exception, so it looks like a bug, if you get it in some
> >> existed plugin.
> >> But to say you exactly, please describe your case. What kind of query
> did
> >> you perform, any UDF's, which data source?
> >> Also logs can help.
> >>
> >> Thanks.
> >>
> >> Kind regards
> >> Vitalii
> >>
> >>
> >> On Tue, Jun 26, 2018 at 1:27 PM Nitin Pawar 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Can someone help me understand below error? and how do I not let this
> >> > happen ??
> >> >
> >> > SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged buffer.
> >> >
> >> > Fragment 0:0
> >> >
> >> > [Error Id: bcd510f6-75ee-49a7-b723-7b35d8575623 on
> >> > ip-10-0-103-63.ec2.internal:31010]
> >> > Caused By: SYSTEM ERROR: IllegalStateException: Tried to remove
> >> unmanaged
> >> > buffer.
> >> >
> >> > Fragment 0:0
> >> >
> >> > [Error Id: bcd510f6-75ee-49a7-b723-7b35d8575623 on
> >> > ip-10-0-103-63.ec2.internal:31010]
> >> >
> >> > --
> >> > Nitin Pawar
> >> >
> >>
> >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
>
> --
> Nitin Pawar
>


Re: Apache Drill Automatically Converts Large Numbers to Exponents

2018-05-08 Thread Khurram Faraaz
Hi Peter,


Where do you see large numbers being represented/displayed as exponents, in 
Drill (and what version of Drill are you on) ?

 1. Do you see that behavior on sqlline prompt ?

 2. Do you see it from a JDBC application ?

 3. Do you see that on Drill's Web UI ?

 4. If you don't see it on the three options above, where is it that you see 
large numbers being represented as exponents ?


Your response to questions above will help us investigate.


Thanks,

Khurram


From: Vova Vysotskyi 
Sent: Tuesday, May 8, 2018 8:44:46 AM
To: user@drill.apache.org
Subject: Re: Apache Drill Automatically Converts Large Numbers to Exponents

Hi Peter,

If the problem is only with displaying the numbers, you may convert it to
the string with the specified format using TO_CHAR(expression, format) UDF.

For more details please see
https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.apache.org_docs_data-2Dtype-2Dconversion_-23other-2Ddata-2Dtype-2Dconversions=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=Ie9zGKXICN7MzxqsC1pm2IVdufd_8Lh-QXKd_u6VU3Y=2f_ECxi0EnA5vpjDq7nGXG9YyswBfUp9QRt_tMO_64Q=

Kind regards,
Volodymyr Vysotskyi


вт, 8 трав. 2018 о 18:36 Peter Edike 
пише:

> Hello everyone,
>
> How can I prevent Apache Drill From Displaying large numers as exponents
> as this is not acceptable for my use case
>
>
> Kind regards
> Peter Edike
>


Re: Apache Drill 1.12.0

2018-02-23 Thread Khurram Faraaz
How many unique values does col4 have in bdl.schema.view_in_greenplum ?


Thanks,

Khurram


From: Robles, Edgardo 
Sent: Friday, February 23, 2018 8:27:59 AM
To: user@drill.apache.org
Subject: Apache Drill 1.12.0

Hi,

I am evaluating Apache Drill and have run into the following issue with CTAS 
and large table from RDBMS.

I am using apache drill 1.12.0 with ubuntu 16.04.3 vm 8 cores, 8GB of ram with 
all patches, oracle java 1.8.0.161-b12.  Postgres 9.4.1212 jdbc driver 
connecting to Greenplum 4.3.
I enabled the rdms plugin to connect to Greenplum.


I start drill using drill-embedded but this also fails with sqline.  I have 
tried a few older versions of the jdbc driver with the same error message.
The view contains 25M rows.  This works with smaller tables.  Can apache drill 
chunk data while processing?

The environment is set to Parquet storage format.

I ran a the following sql (example):

create table dfs.data.table1(col1, col2, col3, col4) partition by (col4) as
SELECT col1, col2, col3, col4
FROM bdl.schema.view_in_greenplum
order by col4, col3;

This uses 100% cpu until it fails with the following error:

2018-02-23 10:20:00,681 [256fd5fc-efb8-3504-d08a-0fdcb662f9d6:frag:0:0] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
FragmentExecutor.
java.lang.OutOfMemoryError: Java heap space
at java.lang.String.toCharArray(String.java:2899) ~[na:1.8.0_161]
at java.util.zip.ZipCoder.getBytes(ZipCoder.java:78) ~[na:1.8.0_161]
at java.util.zip.ZipFile.getEntry(ZipFile.java:316) ~[na:1.8.0_161]
at java.util.jar.JarFile.getEntry(JarFile.java:240) ~[na:1.8.0_161]
at java.util.jar.JarFile.getJarEntry(JarFile.java:223) ~[na:1.8.0_161]
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1042) 
~[na:1.8.0_161]
at sun.misc.URLClassPath.getResource(URLClassPath.java:239) 
~[na:1.8.0_161]
at java.net.URLClassLoader$1.run(URLClassLoader.java:365) 
~[na:1.8.0_161]
at java.net.URLClassLoader$1.run(URLClassLoader.java:362) 
~[na:1.8.0_161]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.8.0_161]
at java.net.URLClassLoader.findClass(URLClassLoader.java:361) 
~[na:1.8.0_161]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_161]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) 
~[na:1.8.0_161]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_161]
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2122)
 ~[postgresql-9.4.1212.jar:9.4.1212]
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288) 
~[postgresql-9.4.1212.jar:9.4.1212]
at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430) 
~[postgresql-9.4.1212.jar:9.4.1212]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356) 
~[postgresql-9.4.1212.jar:9.4.1212]
at 
org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:303) 
~[postgresql-9.4.1212.jar:9.4.1212]
at 
org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:289) 
~[postgresql-9.4.1212.jar:9.4.1212]
at 
org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:266) 
~[postgresql-9.4.1212.jar:9.4.1212]
at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:233) 
~[postgresql-9.4.1212.jar:9.4.1212]
at 
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup(JdbcRecordReader.java:177)
 ~[drill-jdbc-storage-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:242)
 ~[drill-java-exec-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:166) 
~[drill-java-exec-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 ~[drill-java-exec-1.12.0.jar:1.12.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 ~[drill-java-exec-1.12.0.jar:1.12.0]


This also fails while running on Windows 2012 

Re: Code too large

2018-02-20 Thread Khurram Faraaz
vv3460 with id: TypedFieldId [fieldIds=[61], remainder=null].");}
vv3460 = ((NullableVarCharVector) tmp3462);}}
public void __DRILL_INIT__()throws SchemaChangeException{}
}
*/
3. Error Continue :-
Caused by: org.codehaus.commons.compiler.CompileException: File
'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen1348.java]', Line
1517, Column 17: ProjectorGen1348.java:1517: error: code too largepublic
void doEval(int inIndex, int outIndex)^
(compiler.err.limit.code) at
org.apache.drill.exec.compile.DrillDiagnosticListener.report(DrillDiagnosticListener.java:43)
~[drill-java-exec-1.11.0.jar:1.11.0] at
com.sun.tools.javac.api.ClientCodeWrapper$WrappedDiagnosticListener.report(ClientCodeWrapper.java:559)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.util.Log.writeDiagnostic(Log.java:428)
~[jdk.tools-1.7.jar:na] at com.sun.tools.javac.util.Log.report(Log.java:416)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.util.AbstractLog.error(AbstractLog.java:94)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.jvm.Code.checkLimits(Code.java:95) ~[jdk.tools-1.7.jar:na]
 at com.sun.tools.javac.jvm.Gen.genMethod(Gen.java:948) ~[jdk.tools-1.7.jar:na]
 at com.sun.tools.javac.jvm.Gen.visitMethodDef(Gen.java:885)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.tree.JCTree$JCMethodDecl.accept(JCTree.java:669)
~[jdk.tools-1.7.jar:na] at com.sun.tools.javac.jvm.Gen.genDef(Gen.java:684)
~[jdk.tools-1.7.jar:na] at com.sun.tools.javac.jvm.Gen.genClass(Gen.java:2235)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.main.JavaCompiler.genCode(JavaCompiler.java:712)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1451)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1419)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.main.JavaCompiler.compile2(JavaCompiler.java:870)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:829)
~[jdk.tools-1.7.jar:na] at com.sun.tools.javac.main.Main.compile(Main.java:439)
~[jdk.tools-1.7.jar:na] at
com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:132)
~[jdk.tools-1.7.jar:na] at
org.apache.drill.exec.compile.JDKClassCompiler.doCompile(JDKClassCompiler.java:82)
~[drill-java-exec-1.11.0.jar:1.11.0] at
org.apache.drill.exec.compile.JDKClassCompiler.getByteCode(JDKClassCompiler.java:64)
~[drill-java-exec-1.11.0.jar:1.11.0] at
org.apache.drill.exec.compile.AbstractClassCompiler.getClassByteCode(AbstractClassCompiler.java:55)
~[drill-java-exec-1.11.0.jar:1.11.0] at
org.apache.drill.exec.compile.ClassCompilerSelector.getClassByteCode(ClassCompilerSelector.java:123)
~[drill-java-exec-1.11.0.jar:1.11.0] at
org.apache.drill.exec.compile.QueryClassLoader.getClassByteCode(QueryClassLoader.java:90)
~[drill-java-exec-1.11.0.jar:1.11.0] at
org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:243)
~[drill-java-exec-1.11.0.jar:1.11.0] ... 42 common frames omitted






On Thu, Feb 15, 2018 2:21 AM, Arjun kr arjun...@outlook.com  wrote:
Hi Anup,







You may try setting configuration option 'exec.java.compiler.exp_in_method_size'
to lower value from default of 50 and run the query to see if it helps. Even
lowering to a value of 1 doesn't help, the query details and stack trace may
helpful for analysis as Khurram mentioned.







alter session set `exec.java.compiler.exp_in_method_size` = ;










Refer doc for more details on this configuration property:
https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.apache.org_docs_configuration-2Doptions-2Dintroduction_=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=REIpiPGn0G5wyx7pQLeSqlcxzhPl7nyVZH0N5737hhw=XcZrMGyI88G8MAc8PO37lcr2KmhvvvsmRNCy4QdPz4M=







Thanks,







Arjun









From: Khurram Faraaz <kfar...@mapr.com>

Sent: Thursday, February 15, 2018 12:55 AM

To: user@drill.apache.org

Subject: Re: Code too large




Can you please share the exact query as is (without formatting) here. And the
error that you see on Drill 1.11.0, so we can try and repro it.







Thanks,




Khurram






From: Anup Tiwari <anup.tiw...@games24x7.com>

Sent: Wednesday, February 14, 2018 3:14:01 AM

To: user@drill.apache.org

Subject: Re: Code too large




Hi Team,

Do we have any updates on this? It is coming even in Drill 1.11.0 and i tried

same query on hive, it is working perfectly.
















On Fri, Mar 25, 2016 2:30 AM, Jacques Nadeau jacq...@dremio.com wrote:

Looking at the trace it looks like Drill is correctly using the JDK




compiler which is good. However, it looks like we're not breaking apart the




expressions into submethods. @Aditya, since you worked on DRILL-1155, do




you have any sense to why we're not subdividing the code in this situation?













If I recall, we said we would only include 100 exp

Re: Code too large

2018-02-14 Thread Khurram Faraaz
Can you please share the exact query as is (without formatting) here. And the 
error that you see on Drill 1.11.0, so we can try and repro it.


Thanks,

Khurram


From: Anup Tiwari 
Sent: Wednesday, February 14, 2018 3:14:01 AM
To: user@drill.apache.org
Subject: Re: Code too large

Hi Team,
Do we have any updates on this? It is coming even in Drill 1.11.0 and i tried
same query on hive, it is working perfectly.





On Fri, Mar 25, 2016 2:30 AM, Jacques Nadeau jacq...@dremio.com  wrote:
Looking at the trace it looks like Drill is correctly using the JDK

compiler which is good. However, it looks like we're not breaking apart the

expressions into submethods. @Aditya, since you worked on DRILL-1155, do

you have any sense to why we're not subdividing the code in this situation?




If I recall, we said we would only include 100 expressions per project

method. Or are we hitting a limit of fields or some other limit?




We test large projects here:




https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_blob_master_exec_java-2Dexec_src_test_java_org_apache_drill_exec_compile_TestLargeFileCompilation.java=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=7Q3xDEBNDJpTxOv5CZjQjvw57jo8TxbjdDjkojonXp8=uiwsx3NwkuLNc6GpAJfcs0KPM0FQcNoAt4OD7TcMr8w=




--

Jacques Nadeau

CTO and Co-Founder, Dremio




On Thu, Mar 24, 2016 at 11:54 AM, Edmon Begoli  wrote:




> Logs and queries attached.

>

> On Thu, Mar 24, 2016 at 2:22 PM, Edmon Begoli  wrote:

>

>> We are working on it.

>>

>> On Thu, Mar 24, 2016 at 2:07 PM, Ted Dunning 

>> wrote:

>>

>>> Providing the stack traces and logs is a key first step in fixing this.

>>>

>>>

>>>

>>> On Thu, Mar 24, 2016 at 11:02 AM, Nirav Shah 

>>> wrote:

>>>

>>> > I had the same issue but till date even in version 1.6 it's there.

>>> > I checked about JDK also.

>>> >

>>> > it would be great if we can solve this issue.

>>> >

>>> > Regards,

>>> > Nirav

>>> >

>>> > On Thu, Mar 24, 2016 at 11:06 PM, Edmon Begoli 

>>> wrote:

>>> >

>>> > > Here is the query:

>>> > >

>>> > > CREATE TABLE

>>> > > cms.joe.`ALL_IP_HS_FINAL`

>>> > > AS

>>> > > SELECT CASE WHEN columns[0] =3D '' THEN NULL ELSE CAST(columns[0] as

>>> > > DOUBLE=

>>> > > )

>>> > > END as `DSYSRTKY`, CASE WHEN columns[1] =3D '' THEN NULL ELSE

>>> > > CAST(columns[1] as DOUBLE) END as `CLAIMNO`, NULLIF(columns[2],'')

>>> as

>>> > > `PROVIDER`, CASE WHEN columns[3] =3D '' THEN NULL ELSE

>>> > TO_DATE(columns[3],

>>> > > 'MMdd') END as `THRU_DT`, NULLIF(columns[4],'') as `RIC_CD`,

>>> > > NULLIF(columns[5],'') as `CLM_TYPE`, NULLIF(columns[6],'') as

>>> `QUERY_CD`,

>>> > > NULLIF(columns[7],'') as `FAC_TYPE`, NULLIF(columns[8],'') as

>>> `TYPESRVC`,

>>> > > NULLIF(columns[9],'') as `FREQ_CD`, NULLIF(columns[10],'') as

>>> `FI_NUM`,

>>> > > NULLIF(columns[11],'') as `NOPAY_CD`, CASE WHEN columns[12] =3D ''

>>> THEN

>>> > > NUL=

>>> > > L

>>> > > ELSE CAST(columns[12] as DOUBLE) END as `PMT_AMT`, CASE WHEN

>>> > columns[13] =

>>> > > =3D

>>> > > '' THEN NULL ELSE CAST(columns[13] as DOUBLE) END as `PRPAYAMT`,

>>> > > NULLIF(columns[14],'') as `PRPAY_CD`, NULLIF(columns[15],'') as

>>> > `ACTIONCD`,

>>> > > NULLIF(columns[16],'') as `PRSTATE`, NULLIF(columns[17],'') as

>>> > `ORGNPINM`,

>>> > > NULLIF(columns[18],'') as `AT_UPIN`, NULLIF(columns[19],'') as

>>> `AT_NPI`,

>>> > > NULLIF(columns[20],'') as `OP_UPIN`, NULLIF(columns[21],'') as

>>> `OP_NPI`,

>>> > > NULLIF(columns[22],'') as `OT_UPIN`, NULLIF(columns[23],'') as

>>> `OT_NPI`,

>>> > > NULLIF(columns[24],'') as `MCOPDSW`, NULLIF(columns[25],'') as

>>> `STUS_CD`,

>>> > > NULLIF(columns[26],'') as `PPS_IND`, CASE WHEN columns[27] =3D ''

>>> THEN

>>> > NULL

>>> > > ELSE CAST(columns[27] as DOUBLE) END as `TOT_CHRG`, CASE WHEN

>>> > columns[28]

>>> > > =3D '' THEN NULL ELSE TO_DATE(columns[28], 'MMdd') END as

>>> > `ADMSN_DT`,

>>> > > NULLIF(columns[29],'') as `TYPE_ADM`,

>>> > > NULLIF(columns[30],'') as `SRC_ADMS`, NULLIF(columns[31],'') as

>>> > `PTNTSTUS`,

>>> > > CASE WHEN columns[32] =3D '' THEN NULL ELSE CAST(columns[32] as

>>> DOUBLE)

>>> > END

>>> > > as `PER_DIEM`, CASE WHEN columns[33] =3D '' THEN NULL ELSE

>>> > > CAST(columns[33=

>>> > > ]

>>> > > as DOUBLE) END as `DED_AMT`, CASE WHEN columns[34] =3D '' THEN NULL

>>> ELSE

>>> > > CAST(columns[34] as DOUBLE) END as `COIN_AMT`, CASE WHEN

>>> columns[35] =3D

>>> > > '=

>>> > > '

>>> > > THEN NULL ELSE CAST(columns[35] as DOUBLE) END as `BLDDEDAM`, CASE

>>> WHEN

>>> > > columns[36] =3D '' THEN NULL ELSE CAST(columns[36] as DOUBLE) END as

>>> > > `PCCHGAMT`, CASE WHEN columns[37] =3D '' THEN NULL ELSE

>>> CAST(columns[37]

>>> > as

>>> > > DOUBLE) END as `NCCHGAMT`,

>>> > > CASE WHEN columns[38] =3D '' THEN NULL 

Re: Apache Drill connection issue in tableau

2017-11-03 Thread Khurram Faraaz
To adjust the timeout you can set a higher value for the
drill.exec.user.timeout in conf/drill-override.conf, the value is specified in 
seconds.



From: Divya Gehlot 
Sent: Thursday, November 2, 2017 11:21:16 PM
To: user@drill.apache.org
Subject: Re: Apache Drill connection issue in tableau

Hi Timothy,
Thanks for the help!
Yes you are right I could see below error message in the logs
Where can I set the rpc time out as I couldnt find the rpc timeout config
in when login through web console -> Options
Could you please help me pointing it ?

2017-11-03 04:48:30,413 [BitServer-4] INFO
o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate
query result to initiating client. This would happen if a client is
disconnected before response notice can be sent.
org.apache.drill.exec.rpc.RpcException: Failure sending message.
at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:124)
[drill-rpc-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.rpc.user.UserServer$BitToUserConnection.sendResult(UserServer.java:199)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:868)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:1001)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:116)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1027)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1020)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
[drill-common-1.10.0.jar:1.10.0]
at 
org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
[drill-common-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1022)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1040)
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:506)
[drill-java-exec-1.10.0.jar:1.10.0]


Thanks,
Divya

On 1 November 2017 at 01:08, Timothy Farkas  wrote:

> Can you share the logs from your Drill bit? Without the logs I'm taking a
> shot in the dark but
>
> you can try increasing the timeout between the DrillClient and foreman by
> adding this configuration
>
> to your drill-override.conf
>
>
>
> drill.exec.rpc.user.timeout: 30
>
> Thanks,
> Tim
>
>
> 
> From: Divya Gehlot 
> Sent: Tuesday, October 31, 2017 12:26:55 AM
> To: user@drill.apache.org
> Subject: Apache Drill connection issue in tableau
>
> Hi ,
> I am facing  a strange Drill connection issue.
> I am connecting to Drill through Tableau.
> I have published coupled of reports to tableau server.
> When I connect to Drill thorugh tableau desktop and tableau server at same
> time
> it becomes unresponding.I keeping getting "connection time out error" or in
> tableau "connection to port is lost" .
> But when I close the tableau desktop and just connected to published
> reports to tableau server everything works fine and no time out error.
>
> Not sure if I need to update any drill configurations ?
>
> Appreciate the help !
>
> Thanks,
> Divya
>


Re: Issue in executing query on Drill Cluster

2017-10-18 Thread Khurram Faraaz
Can you please see if you can access that file as the default user in both 
cases, because you mention that the default user is different in both cases.

Try to do a hadoop fs -ls on that file, from both the different users and 
verify.


Also, can you please share the JDBC connection string that you use to connect 
to Drill (embedded + cluster), and in both cases do you connect from sqlline 
from the command prompt or do you connect from Drill's web UI ?


Thanks,

Khurram



From: Chetan Kothari <chetan.koth...@oracle.com>
Sent: Tuesday, October 17, 2017 11:53:40 PM
To: user@drill.apache.org
Subject: RE: Issue in executing query on Drill Cluster

HI Khurram



I have given permission 777 to file /datalake/replicator/testdemo2.



I am connecting to Drill with default user.

But default user is different in both cases.



Is it creating issue?

How do we fix it?



Regards

Chetan





-Original Message-
From: Khurram Faraaz [mailto:kfar...@mapr.com]
Sent: Wednesday, October 18, 2017 12:18 PM
To: user@drill.apache.org
Subject: Re: Issue in executing query on Drill Cluster



Hi Chetan,





  1.  What are the permissions to the file /datalake/replicator/testdemo2 ?

  2.  Are you connecting as the same user to Drill in both cases, (i) in 
embedded mode, and (ii) in the 4 node Drillbit cluster ?





Thanks,



Khurram





From: Chetan Kothari mailto:chetan.koth...@oracle.com"chetan.koth...@oracle.com>

Sent: Tuesday, October 17, 2017 9:43:42 PM

To: HYPERLINK "mailto:user@drill.apache.org"user@drill.apache.org

Subject: Issue in executing query on Drill Cluster



HI All







When I am firing following query against dataset present in alluxio from 
embedded-drill, it is working normally.







select REQ_HIST_REQUISITION_NUMBER, CONVERT_FROM(REQ_HIST_CREATION_DATE,'UTF8') 
AS REQ_HIST_CREATION_DATE , CONVERT_FROM(REQ_HIST_END_DATE,'UTF8') AS 
REQ_HIST_END_DATE, CONVERT_FROM(DEPT_NAME,'UTF8') AS DEPT_NAME ,  
CONVERT_FROM(MANAGER_FIRST_NAME,'UTF8') AS MANAGER_FIRST_NAME, 
CONVERT_FROM(MANAGER_LAST_NAME,'UTF8') AS 
MANAGER_LAST_NAME,CONVERT_FROM(DEPT_LOCATION_NUMBER,'UTF8') AS 
DEPT_LOCATION_NUMBER, REQUISITION_APPROVED, 
REQUISITION_FILLED,REQUISITION_CANCELLED,REQUISITION_ON_HOLD, 
REQUISITION_SOURCING,REQUISITION_PENDING,REQUISITION_DRAFTED,REQUISITION_DELETED,REQUISITION_REJECTED
  from chetanalluxio.`./datalake/replicator/testdemo2`







But same query when I execute from 4 Node Drill Cluster, it fails with 
following error.



Please note that I have not enabled parquet meta-data caching feature.







org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 599 to line 1, column 611: Table 
'chetanalluxio../datalake/replicator/testdemo2' not found SQL Query null [Error 
Id: 60f552bf-3cc1-4939-807e-827dffcab11c on 
bdcs4node-bdcsce-1.bddcs.oracleinternalden14.nimbula.oraclecorp.com:31010].







Also I am able to browse above file present in alluxio from Drill using show 
files in chetanalluxio.







I am not sure why this issue is coming, given that it is working normally if 
execute query using embedded-drill.



Any inputs on this will be helpful.











Regards



Chetan




Re: Issue in executing query on Drill Cluster

2017-10-18 Thread Khurram Faraaz
Hi Chetan,


  1.  What are the permissions to the file /datalake/replicator/testdemo2 ?
  2.  Are you connecting as the same user to Drill in both cases, (i) in 
embedded mode, and (ii) in the 4 node Drillbit cluster ?


Thanks,

Khurram


From: Chetan Kothari 
Sent: Tuesday, October 17, 2017 9:43:42 PM
To: user@drill.apache.org
Subject: Issue in executing query on Drill Cluster

HI All



When I am firing following query against dataset present in alluxio from 
embedded-drill, it is working normally.



select REQ_HIST_REQUISITION_NUMBER, CONVERT_FROM(REQ_HIST_CREATION_DATE,'UTF8') 
AS REQ_HIST_CREATION_DATE , CONVERT_FROM(REQ_HIST_END_DATE,'UTF8') AS 
REQ_HIST_END_DATE, CONVERT_FROM(DEPT_NAME,'UTF8') AS DEPT_NAME ,  
CONVERT_FROM(MANAGER_FIRST_NAME,'UTF8') AS MANAGER_FIRST_NAME, 
CONVERT_FROM(MANAGER_LAST_NAME,'UTF8') AS 
MANAGER_LAST_NAME,CONVERT_FROM(DEPT_LOCATION_NUMBER,'UTF8') AS 
DEPT_LOCATION_NUMBER, REQUISITION_APPROVED, 
REQUISITION_FILLED,REQUISITION_CANCELLED,REQUISITION_ON_HOLD, 
REQUISITION_SOURCING,REQUISITION_PENDING,REQUISITION_DRAFTED,REQUISITION_DELETED,REQUISITION_REJECTED
  from chetanalluxio.`./datalake/replicator/testdemo2`



But same query when I execute from 4 Node Drill Cluster, it fails with 
following error.

Please note that I have not enabled parquet meta-data caching feature.



org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 599 to line 1, column 611: Table 
'chetanalluxio../datalake/replicator/testdemo2' not found SQL Query null [Error 
Id: 60f552bf-3cc1-4939-807e-827dffcab11c on 
bdcs4node-bdcsce-1.bddcs.oracleinternalden14.nimbula.oraclecorp.com:31010].



Also I am able to browse above file present in alluxio from Drill using show 
files in chetanalluxio.



I am not sure why this issue is coming, given that it is working normally if 
execute query using embedded-drill.

Any inputs on this will be helpful.





Regards

Chetan


Re: Queries getting CANCELED

2017-10-17 Thread Khurram Faraaz
Can you please share your query that generates/creates the CSV ?

What is the size of the CSV file ?

What version of Drill are you on ?


Thanks,

Khurram


From: Rahul Raj 
Sent: Tuesday, October 17, 2017 7:09:35 PM
To: user@drill.apache.org
Subject: Queries getting CANCELED

I have a web app that generates CSV files using Drill.  When the CSV size
gets larger, the query state moves to CANCELED and results are always
partial/truncated. The same happens with larger parquet files too and works
fine with smaller data sets.

Code snippet is similar to:

try(Connection connection = ctx.getConnection()){
   try(Statement st = connection.createStatement()){
 st.executeQuery("alter session set `store.format` ='csv'");
 st.executeQuery(query);
 st.executeQuery("alter session set `store.format` ='parquet'");
   }
}

The connections are wrapped within DBCP connection pool, I suspect DBCP
connection pool cancelling the queries. I set queryTimeout as 0, tried
adding some delays to see if its related to finishing the CSV writer, still
getting cancelled.

When the same queries are executed from Drill Web Console, complete results
are generated. Anyone faced similar issues? What could be wrong the the
scenario above?

Regards,
Rahul

--
 This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom it is
addressed. If you are not the named addressee then you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and delete this e-mail from your system.


Re: Query Optimization

2017-08-17 Thread Khurram Faraaz
Please share your SQL query and the query plan.

To get the query plan, execute EXPLAIN PLAN FOR ;


Thanks,

Khurram


From: Divya Gehlot 
Sent: Friday, August 18, 2017 7:15:18 AM
To: user@drill.apache.org
Subject: Re: Query Optimization

Hi ,
Yes its the same query its just the ran the metadata refresh command .
My understanding is metadata refresh command saves reading the metadata.
How about column values ... Why is it reading all the files after metedata
refresh ?
Partition helps to retrieve data faster .
Like in hive how it happens when you mention the partition column in where
condition
it just goes and read and improves the query performace .
In my query also I where conidtion has  partioning column it should go and
read those partitioned files right ?
Why is it taking more time ?
Does the Drill works in different way compare to hive ?


Thanks,
Divya

On 18 August 2017 at 07:37, Padma Penumarthy  wrote:

> It might read all those files if some new data gets added after running
> refresh metadata cache.
> If everything is same before and after metadata refresh i.e. no
> new data added and query is exactly the same, then it should not do that.
> Also, check if you can partition in  a way that will not create so many
> files in the
> first place.
>
> Thanks,
> Padma
>
>
> > On Aug 16, 2017, at 10:54 PM, Divya Gehlot 
> wrote:
> >
> > Hi,
> > Another observation is
> > My query had where conditions based on the partition values
> >
> > Total number of parquet files in directory  - 102290
> >> Before Metadata refresh - Its reading only 4 files
> >> After metadata refresh - its reading 102290 files
> >
> >
> > This is how the refresh metadata works I mean it scans each and every
> files
> > and get the results ?
> >
> > I dont  have access to logs now .
> >
> > Thanks,
> > Divya
> >
> > On 17 August 2017 at 13:48, Divya Gehlot 
> wrote:
> >
> >> Hi,
> >> Another observation is
> >> My query had where conditions based on the partition values
> >> Before Metadata refresh - Its reading only 4 files
> >> After metadata refresh - its reading 102290 files
> >>
> >> Thanks,
> >> Divya
> >>
> >> On 17 August 2017 at 13:03, Padma Penumarthy 
> wrote:
> >>
> >>> Does your query have partition filter ?
> >>> Execution time is increased most likely because partition pruning is
> not
> >>> happening.
> >>> Did you get a chance to look at the logs ?  That might give some clues.
> >>>
> >>> Thanks,
> >>> Padma
> >>>
> >>>
>  On Aug 16, 2017, at 9:32 PM, Divya Gehlot 
> >>> wrote:
> 
>  Hi,
>  Even I am surprised .
>  I am running Drill version 1.10  on MapR enterprise version.
>  *Query *- Selecting all the columns on partitioned parquet table
> 
>  I observed few things from Query statistics :
> 
>  Value
> 
>  Before Refresh Metadata
> 
>  After Refresh Metadata
> 
>  Fragments
> 
>  1
> 
>  13
> 
>  DURATION
> 
>  01 min 0.233 sec
> 
>  18 min 0.744 sec
> 
>  PLANNING
> 
>  59.818 sec
> 
>  33.087 sec
> 
>  QUEUED
> 
>  Not Available
> 
>  Not Available
> 
>  EXECUTION
> 
>  0.415 sec
> 
>  17 min 27.657 sec
> 
>  The planning time is being reduced by approx 60% but the execution
> time
>  increased  drastically.
>  I would like to understand why the exceution time increases after the
>  metadata refresh .
> 
> 
>  Appreciate the help.
> 
>  Thanks,
>  divya
> 
> 
>  On 17 August 2017 at 11:54, Padma Penumarthy 
> >>> wrote:
> 
> > Refresh table metadata should  help reduce query planning time.
> > It is odd that it went up after you did refresh table metadata.
> > Did you check the logs to see what is happening ? You might have to
> > turn on some debugs if needed.
> > BTW, what version of Drill are you running ?
> >
> > Thanks,
> > Padma
> >
> >
> >> On Aug 16, 2017, at 8:15 PM, Divya Gehlot 
> > wrote:
> >>
> >> Hi,
> >> I have data in parquet file format .
> >> when I run the query the data and see the execution plan I could see
> >> following
> >> statistics
> >>
> >>> TOTAL FRAGMENTS: 1
>  DURATION: 01 min 0.233 sec
>  PLANNING: 59.818 sec
>  QUEUED: Not Available
>  EXECUTION: 0.415 sec
> >>>
> >>>
> >>
> >> As its a paquet file format I tried enabling refresh meta data
> >> and run below command
> >> REFRESH TABLE METADATA  ;
> >> then run the same query again on the same table same data (no
> changes
> >>> in
> >> data)  and could find the statistics as show below :
> >>
> >> TOTAL FRAGMENTS: 13
>  

Re: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes

2017-08-15 Thread Khurram Faraaz
DRILL-4510 is open for the error message that you see.


Thanks,

Khurram


From: Divya Gehlot <divya.htco...@gmail.com>
Sent: Tuesday, August 15, 2017 7:41:39 AM
To: user@drill.apache.org
Subject: Re: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support 
schema changes

Hi ,

As advised I set the config parameter

alter system set `drill.exec.functions.cast_empty_string_to_null` = true;

When I try to ran the query on `Capacity` column with aggregated values it
threw below error to me :
s:[
SYSTEM ERROR: IllegalStateException: Failure while reading vector.
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but
was holding vector class
org.apache.drill.exec.vector.NullableVarCharVector, field= Capacity


Thanks,
Divya

On 14 August 2017 at 17:56, Khurram Faraaz <kfar...@mapr.com> wrote:

> You should look at DRILL-3214 which is Open. And DRILL-4503 is open for
> Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> changes
>
>
> And you can try setting this config parameter to see if it helps in your
> case.
>
>
> alter system set `drill.exec.functions.cast_empty_string_to_null` = true;
>
>
> Thanks,
>
> Khurram
>
> 
> From: Divya Gehlot <divya.htco...@gmail.com>
> Sent: Monday, August 14, 2017 3:06:57 PM
> To: user@drill.apache.org
> Subject: Re: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support
> schema changes
>
> forgot to mention I tried this first but some post  in Apache Drill JIRA
> mentioned to use 0  instead of null to get rid of error .
> unfortunately it did not work in my case :(
>
>
> Thanks,
> Divya
>
> On 14 August 2017 at 14:56, Khurram Faraaz <kfar...@mapr.com> wrote:
>
> > Can you please try this, since you mention column has empty values try
> > using null instead of zero, because 0 (zero) is  not null.
> >
> >
> > CASE WHEN `Capacity` = '' THEN CAST(null AS integer) ELSE CAST(`Capacity`
> > AS integer)
> >
> > END AS `Capacity`
> >
> >
> > Thanks,
> >
> > Khurram
> >
> > 
> > From: Divya Gehlot <divya.htco...@gmail.com>
> > Sent: Monday, August 14, 2017 12:14:26 PM
> > To: user@drill.apache.org
> > Subject: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support
> > schema changes
> >
> > Hi,
> > I am getting below error :
> >
> > > UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> > > changes Fragment 0:0
> >
> >
> > *Some background to understand the error scenario : *
> >
> > I have  data saved in parquet files in multidirectory structure.
> > One of the column of the table is empty values so I used below to handle
> > null values
> >
> > > CASE WHEN `Capacity` = '' THEN CAST(0 AS INT) ELSE CAST(`Capacity` AS
> > INT)
> > > END AS `Capacity`
> >
> >
> > My query does some aggregation like sum of the column values .My query
> > looks like below :
> > SELECT
> > `COUNTRY`,
> > `YEAR`,
> > `MONTH`,
> > `DAY`,
> > `Capacity`,
> > SUM(CAST(`UserCount` AS INT)) AS `Dailyusers`
> > FROM `dfs`.`default`.`./parquetfiles/countries/`
> > WHERE `YEAR` =2017 AND `MONTH` = 2
> > group by `COUNTRY`,`YEAR`,`MONTH`,`DAY`,`Office``Capacity`
> >
> > When I include `Capacity` column in above query it gives me below error :
> >
> > > UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> > > changes
> >
> >
> > When I exclude the `Capacity` column the query runs fine and I could view
> > the expected results .
> >
> > Few approaches/verification  which I tried after browsing the internet
> > regarding the error whic i encounter :
> > 1 : alter session set `planner.enable_streamagg` = false;
> > alter session set `planner.enable_hashjoin` = false;
> >
> > Still the same error
> >
> >
> >
> > >  UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> > > changes Fragment
> >
> >
> > 2 : somebody suggested that if  some of your files are empty you will
> > encounter the error but I checked and confirm that none of my parquet
> file
> > is empty.
> >
> > 3.I have same schema across all the files .
> >
> > 4. If I query the files individually there is no error and It works fine
> .
> > For instance if my query is
> > SELECT
> > `COUNTRY`,
> > `YEAR`,
> > `MONTH`,
> > `DAY`,
> > `Capacity`,
> > SUM(CAST(`UserCount` AS INT)) AS `Dailyusers`
> > FROM `dfs`.`default`.`./parquetfiles/countries/china/0_0_0.parquet`
> > WHERE `YEAR` =2017 AND `MONTH` = 2
> > group by `COUNTRY`,`YEAR`,`MONTH`,`DAY`,`Office``Capacity`
> >
> > Above query works fine for all the countries which I have under countries
> > directory.
> > But if I query until directory countries its throws error .
> >
> > Appreciate the help!
> >
> > Thanks,
> > Divya
> >
>


Re: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes

2017-08-14 Thread Khurram Faraaz
You should look at DRILL-3214 which is Open. And DRILL-4503 is open for Error: 
UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes


And you can try setting this config parameter to see if it helps in your case.


alter system set `drill.exec.functions.cast_empty_string_to_null` = true;


Thanks,

Khurram


From: Divya Gehlot <divya.htco...@gmail.com>
Sent: Monday, August 14, 2017 3:06:57 PM
To: user@drill.apache.org
Subject: Re: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support 
schema changes

forgot to mention I tried this first but some post  in Apache Drill JIRA
mentioned to use 0  instead of null to get rid of error .
unfortunately it did not work in my case :(


Thanks,
Divya

On 14 August 2017 at 14:56, Khurram Faraaz <kfar...@mapr.com> wrote:

> Can you please try this, since you mention column has empty values try
> using null instead of zero, because 0 (zero) is  not null.
>
>
> CASE WHEN `Capacity` = '' THEN CAST(null AS integer) ELSE CAST(`Capacity`
> AS integer)
>
> END AS `Capacity`
>
>
> Thanks,
>
> Khurram
>
> 
> From: Divya Gehlot <divya.htco...@gmail.com>
> Sent: Monday, August 14, 2017 12:14:26 PM
> To: user@drill.apache.org
> Subject: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support
> schema changes
>
> Hi,
> I am getting below error :
>
> > UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> > changes Fragment 0:0
>
>
> *Some background to understand the error scenario : *
>
> I have  data saved in parquet files in multidirectory structure.
> One of the column of the table is empty values so I used below to handle
> null values
>
> > CASE WHEN `Capacity` = '' THEN CAST(0 AS INT) ELSE CAST(`Capacity` AS
> INT)
> > END AS `Capacity`
>
>
> My query does some aggregation like sum of the column values .My query
> looks like below :
> SELECT
> `COUNTRY`,
> `YEAR`,
> `MONTH`,
> `DAY`,
> `Capacity`,
> SUM(CAST(`UserCount` AS INT)) AS `Dailyusers`
> FROM `dfs`.`default`.`./parquetfiles/countries/`
> WHERE `YEAR` =2017 AND `MONTH` = 2
> group by `COUNTRY`,`YEAR`,`MONTH`,`DAY`,`Office``Capacity`
>
> When I include `Capacity` column in above query it gives me below error :
>
> > UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> > changes
>
>
> When I exclude the `Capacity` column the query runs fine and I could view
> the expected results .
>
> Few approaches/verification  which I tried after browsing the internet
> regarding the error whic i encounter :
> 1 : alter session set `planner.enable_streamagg` = false;
> alter session set `planner.enable_hashjoin` = false;
>
> Still the same error
>
>
>
> >  UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> > changes Fragment
>
>
> 2 : somebody suggested that if  some of your files are empty you will
> encounter the error but I checked and confirm that none of my parquet file
> is empty.
>
> 3.I have same schema across all the files .
>
> 4. If I query the files individually there is no error and It works fine .
> For instance if my query is
> SELECT
> `COUNTRY`,
> `YEAR`,
> `MONTH`,
> `DAY`,
> `Capacity`,
> SUM(CAST(`UserCount` AS INT)) AS `Dailyusers`
> FROM `dfs`.`default`.`./parquetfiles/countries/china/0_0_0.parquet`
> WHERE `YEAR` =2017 AND `MONTH` = 2
> group by `COUNTRY`,`YEAR`,`MONTH`,`DAY`,`Office``Capacity`
>
> Above query works fine for all the countries which I have under countries
> directory.
> But if I query until directory countries its throws error .
>
> Appreciate the help!
>
> Thanks,
> Divya
>


Re: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes

2017-08-14 Thread Khurram Faraaz
Can you please try this, since you mention column has empty values try using 
null instead of zero, because 0 (zero) is  not null.


CASE WHEN `Capacity` = '' THEN CAST(null AS integer) ELSE CAST(`Capacity` AS 
integer)

END AS `Capacity`


Thanks,

Khurram


From: Divya Gehlot 
Sent: Monday, August 14, 2017 12:14:26 PM
To: user@drill.apache.org
Subject: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
changes

Hi,
I am getting below error :

> UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> changes Fragment 0:0


*Some background to understand the error scenario : *

I have  data saved in parquet files in multidirectory structure.
One of the column of the table is empty values so I used below to handle
null values

> CASE WHEN `Capacity` = '' THEN CAST(0 AS INT) ELSE CAST(`Capacity` AS INT)
> END AS `Capacity`


My query does some aggregation like sum of the column values .My query
looks like below :
SELECT
`COUNTRY`,
`YEAR`,
`MONTH`,
`DAY`,
`Capacity`,
SUM(CAST(`UserCount` AS INT)) AS `Dailyusers`
FROM `dfs`.`default`.`./parquetfiles/countries/`
WHERE `YEAR` =2017 AND `MONTH` = 2
group by `COUNTRY`,`YEAR`,`MONTH`,`DAY`,`Office``Capacity`

When I include `Capacity` column in above query it gives me below error :

> UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> changes


When I exclude the `Capacity` column the query runs fine and I could view
the expected results .

Few approaches/verification  which I tried after browsing the internet
regarding the error whic i encounter :
1 : alter session set `planner.enable_streamagg` = false;
alter session set `planner.enable_hashjoin` = false;

Still the same error



>  UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> changes Fragment


2 : somebody suggested that if  some of your files are empty you will
encounter the error but I checked and confirm that none of my parquet file
is empty.

3.I have same schema across all the files .

4. If I query the files individually there is no error and It works fine .
For instance if my query is
SELECT
`COUNTRY`,
`YEAR`,
`MONTH`,
`DAY`,
`Capacity`,
SUM(CAST(`UserCount` AS INT)) AS `Dailyusers`
FROM `dfs`.`default`.`./parquetfiles/countries/china/0_0_0.parquet`
WHERE `YEAR` =2017 AND `MONTH` = 2
group by `COUNTRY`,`YEAR`,`MONTH`,`DAY`,`Office``Capacity`

Above query works fine for all the countries which I have under countries
directory.
But if I query until directory countries its throws error .

Appreciate the help!

Thanks,
Divya


Re: Non-ascii characters still fails on Drill 1.11

2017-08-11 Thread Khurram Faraaz
We need to add a unit test, the last comment in DRILL-4039 suggests that we add 
a unit test for that fix.


Thanks,

Khurram


From: Arina Yelchiyeva 
Sent: Friday, August 11, 2017 5:03:44 PM
To: user
Subject: Re: Non-ascii characters still fails on Drill 1.11

Hi Rahul,

you need to update system property *saffron.default.charset* to *UTF-16LE*.

Kind regards
Arina

On Fri, Aug 11, 2017 at 9:18 AM, Rahul Raj 
wrote:

> Issue persist even with JDBC driver 1.11.0.
>
> On Fri, Aug 11, 2017 at 11:35 AM, Rahul Raj  com
> > wrote:
>
> > I have not upgraded the drill client jar to 1.11. Will upgrade and
> confirm.
> >
> > Regards,
> > Rahul
> >
> > On Fri, Aug 11, 2017 at 10:32 AM, Rahul Raj  > com> wrote:
> >
> >> Hi,
> >>
> >> https://issues.apache.org/jira/browse/DRILL-4039 was fixed in 1.11. It
> >> failed on the following query below:
> >>
> >> SELECT
> >>   `team_long_name`
> >> FROM
> >>   dfs.user.`football/latest`
> >> WHERE `team_long_name` = 'Górnik Łęczna'
> >> LIMIT 500
> >>
> >> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >> CalciteException: Failed to encode 'Górnik Łęczna' in character set
> >> 'ISO-8859-1' [Error Id: 220b4d90-e2da-46de-8718-5377b6981220 on
> >> vpc12.o3c.in:31010]
> >>
> >> Can someone comment on this?
> >>
> >> Regards,
> >> Rahul
> >>
> >
> >
>
> --
>  This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.
>


Re: regex replace in string

2017-07-27 Thread Khurram Faraaz
Drill docs mention that REGEXP_REPLACE function uses 
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html


Link to REGEXP_REPLACE doc - 
https://drill.apache.org/docs/string-manipulation/#regexp_replace



From: Paul Rogers 
Sent: Thursday, July 27, 2017 11:38:42 AM
To: user@drill.apache.org
Subject: Re: regex replace in string

DRILL-4645: "Regex_replace() function is broken”?

I still wonder, the characters you are looking for are those used to format a 
“columns” column on output. Can you show a couple of lines of your CSV file?

Or, take a look at SELECT * FROM … LIMIT 10 to see if the data is being split 
into columns, or is somehow using the “columns” column…

That aside, the docs don’t state the regex syntax used in the REGEX_REPLACE 
function. Is it Java regex? If so, then you’d have to format it according to 
the Java syntax ([1]). So:

‘\[|,|\.|]’

Note that “[“ and “.” are part of the regex syntax, so must be escaped.

- Paul

[1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

> On Jul 26, 2017, at 10:49 PM, Divya Gehlot  wrote:
>
> Hi,
> I have already set the plugin configuration to extractheader :true .
> and I followed the below link
> https://drill.apache.org/docs/lesson-2-run-queries-with-ansi-sql/
>
> SELECT REGEXP_REPLACE(CAST(`Column1` AS VARCHAR(100)), '[,".]', '') AS
> `Col1` FROM
> dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`
>
> Just extracting column which has special charcaters including the delimeter
> as one of the special character  gives me empty result set .
>
> Am I missing something ?
>
> Appreciate the help.
>
> Thanks,
> Divya
>



Re: regex replace in string

2017-07-26 Thread Khurram Faraaz
regexp_replace function works on that data on Drill 1.11.0, commit id : 4220fb2

{noformat}

Data used was,

[root@centos-01 community]# cat rgex_replce.csv
"This is the column,one "
"This is column , two"
column3
column4

0: jdbc:drill:schema=dfs.tmp> select * from `rgex_replce.csv`;
+--+
|   columns|
+--+
| ["This is the column,one "]  |
| ["This is column , two"] |
| ["column3"]  |
| ["column4"]  |
+--+
4 rows selected (0.391 seconds)

// regexp_replace works as designed on Drill 1.11.0 commit id : 4220fb2

0: jdbc:drill:schema=dfs.tmp> select regexp_replace(cast(columns[0] as 
varchar(256)),'column','foobar') from `rgex_replce.csv`;
+--+
|  EXPR$0  |
+--+
| This is the foobar,one   |
| This is foobar , two |
| foobar3  |
| foobar4  |
+--+
4 rows selected (0.254 seconds)

0: jdbc:drill:schema=dfs.tmp> select 
regexp_replace(columns[0],'column','foobar') from `rgex_replce.csv`;
+--+
|  EXPR$0  |
+--+
| This is the foobar,one   |
| This is foobar , two |
| foobar3  |
| foobar4  |
+--+
4 rows selected (0.238 seconds)
{noformat}


Thanks,

Khurram


From: Kunal Khatua <kkha...@mapr.com>
Sent: Wednesday, July 26, 2017 11:47:22 PM
To: user@drill.apache.org
Subject: RE: regex replace in string

Here is the reference mail for the release candidate of Drill 1.11.0

-Original Message-
From: Arina Yelchiyeva [mailto:arina.yelchiy...@gmail.com]
Sent: Tuesday, July 25, 2017 3:36 AM
To: d...@drill.apache.org
Subject: [VOTE] Release Apache Drill 1.11.0 - rc0

Hi all,

I'd like to propose the first release candidate (rc0) of Apache Drill, version 
1.11.0.

The release candidate covers a total of 126 resolved JIRAs [1]. Thanks to 
everyone who contributed to this release.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted at 
[3].

This release candidate is based on commit
4220fb2fffbc81883df3e5fea575fa0a584852b3 located at [4].

The vote ends at 1:00 PM UTC (5:00 AM PT), July 28, 2017.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1 (non-binding)


[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12339943
[2] http://home.apache.org/~arina/drill/releases/1.11.0/rc0/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1042/
[4] https://github.com/arina-ielchiieva/drill/commits/drill-1.11.0

Kind regards
Arina

-Original Message-
From: Paul Rogers [mailto:prog...@mapr.com]
Sent: Wednesday, July 26, 2017 8:32 AM
To: user@drill.apache.org
Subject: Re: regex replace in string

Hi Divya,

We found a couple of issues in CSV files that would lead to the kind of errors 
you encountered. These issues will be fixed in the upcoming Drill 1.11 release.

Sharing a sample CSV file will let us check the issue. Even better, voting is 
open for the 1.11 release. Please go ahead and download it and try your file 
with that release. Let us know if you still have a problem.

Thanks,

- Paul

> On Jul 26, 2017, at 6:14 AM, Khurram Faraaz <kfar...@mapr.com> wrote:
>
> Can you please share your CSV file, the SQL query and the version of Drill 
> that you are on. So someone can take a look and try to reproduce the error 
> that you are seeing.
>
>
> Thanks,
>
> Khurram
>
> 
> From: Divya Gehlot <divya.htco...@gmail.com>
> Sent: Wednesday, July 26, 2017 3:18:08 PM
> To: user@drill.apache.org
> Subject: regex replace in string
>
> Hi,
> I have a CSV file where  column values are "This is the column,one "
> "This is column , two"
> column3
> column4
>
> When I try to regex_replace it throws error
>
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>> IllegalArgumentException: reallocation size must be non-negative
>> Fragment
>> 0:0
>
>
> How can I read the above columns as one string  like  This is the
> column one Appreciate the help
>
> Thanks,
> Divyab



Re: regex replace in string

2017-07-26 Thread Khurram Faraaz
Can you please share your CSV file, the SQL query and the version of Drill that 
you are on. So someone can take a look and try to reproduce the error that you 
are seeing.


Thanks,

Khurram


From: Divya Gehlot 
Sent: Wednesday, July 26, 2017 3:18:08 PM
To: user@drill.apache.org
Subject: regex replace in string

Hi,
I have a CSV file where  column values are
"This is the column,one "
"This is column , two"
column3
column4

When I try to regex_replace it throws error

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IllegalArgumentException: reallocation size must be non-negative Fragment
> 0:0


How can I read the above columns as one string  like  This is the column
one
Appreciate the help

Thanks,
Divyab


Re: Error when timestamp IN clause contains more elements

2017-07-14 Thread Khurram Faraaz
I tried the below query using data from your query from the IN clause (as is), 
I didn't see an error/Exception on sqlline, I am on Drill 1.11.0


0: jdbc:drill:schema=dfs.tmp> select * from (values(137676420)) WHERE 
EXPR$0 IN (137667780, 137676420, 137685060, 137702340, 
137728260, 137736900, 137745540, 137788740, 137797380, 
137909700, 137918340, 137926980, 137970180, 137978820, 
138030660, 138039300, 138047940, 138091140, 138099780);
++
| EXPR$0 |
++
| 137676420  |
++
1 row selected (0.509 seconds)

Thanks,
Khurram



From: Kunal Khatua 
Sent: Friday, July 14, 2017 11:00:03 PM
To: user@drill.apache.org
Subject: RE: Error when timestamp IN clause contains more elements

This works for 19 elements because the IN clause will do a filter pushdown to 
the source for up to 19 elements. 20 or more... it resorts to using a JOIN, in 
which case it finds data type mismatch.

It is a good practice (for readability purposes) to do casts in such cases.

Alternatively (not recommended in IMO) is to raise the threshold for the 
property  "in_subquery_threshold" in sys.options from 20 to a higher value.

-Original Message-
From: Rahul Raj [mailto:rahul@option3consulting.com]
Sent: Friday, July 14, 2017 7:56 AM
To: user@drill.apache.org
Subject: Error when timestamp IN clause contains more elements

I am getting the error below when there are more than 19 elements in the IN
clause:

DrillRuntimeException: Join only supports implicit casts between 1. Numeric 
data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type:
TIMESTAMP, Right type: BIGINT. Add explicit casts to avoid this error.

However, query succeeds when IN clause contains one element less. Query is

SELECT
  `Date`
, `COUNT`(`Date`) `COUNT_Date`
, `COUNT`(`a`) `COUNT_a`
FROM
  dfs.test.`latest`
WHERE (`Date` IN (137667780, 137676420, 137685060, 137702340, 
137728260, 137736900, 137745540, 137788740, 137797380, 
137909700, 137918340, 137926980, 137970180, 137978820, 
138030660, 138039300, 138047940, 138091140,
138099780))
GROUP BY `Date`
ORDER BY `Date` ASC, `COUNT_Date` ASC;


Regards,
Rahul

--
 This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom it is addressed. If you 
are not the named addressee then you should not disseminate, distribute or copy 
this e-mail. Please notify the sender immediately and delete this e-mail from 
your system.


Re: Increasing store.parquet.block-size

2017-06-15 Thread Khurram Faraaz
Thanks Padma.


From: Padma Penumarthy <ppenumar...@mapr.com>
Sent: Thursday, June 15, 2017 8:58:44 AM
To: user@drill.apache.org
Subject: Re: Increasing store.parquet.block-size

Sure. I will check and try to fix them as well.

Thanks,
Padma

> On Jun 14, 2017, at 3:12 AM, Khurram Faraaz <kfar...@mapr.com> wrote:
>
> Thanks Padma. There are some more related failures reported in DRILL-2478, do 
> you think we should fix them too, if it is an easy fix.
>
>
> Regards,
>
> Khurram
>
> 
> From: Padma Penumarthy <ppenumar...@mapr.com>
> Sent: Wednesday, June 14, 2017 11:43:16 AM
> To: user@drill.apache.org
> Subject: Re: Increasing store.parquet.block-size
>
> I think you meant MB (not GB) below.
> HDFS allows creation of very large files(theoretically, there is no limit).
> I am wondering why >2GB file is a problem. May be it is blockSize >2GB, that 
> is not recommended.
>
> Anyways, we should not let the user be able to set any value and later throw 
> an error.
> I opened a PR to fix this.
> https://github.com/apache/drill/pull/852
>
> Thanks,
> Padma
>
>
> On Jun 9, 2017, at 11:36 AM, Kunal Khatua 
> <kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:
>
> The ideal size depends on what engine is consuming the parquet files (Drill, 
> i'm guessing) and the storage layer. For HDFS, which is usually 
> 128-256GB, we recommend to bump it to about 512GB (with the underlying HDFS 
> blocksize to match that).
>
>
> You'll probably need to experiment a little with different blocks sizes 
> stored on S3 to see which works the best.
>
> <http://www.mapr.com/>
>
> 
> From: Shuporno Choudhury 
> <shuporno.choudh...@manthan.com<mailto:shuporno.choudh...@manthan.com>>
> Sent: Friday, June 9, 2017 11:23:37 AM
> To: user@drill.apache.org<mailto:user@drill.apache.org>
> Subject: Re: Increasing store.parquet.block-size
>
> Thanks for the information Kunal.
> After the conversion, the file size scales down to half if I use gzip
> compression.
> For a 10 GB gzipped csv source file, it becomes 5GB (2+2+1) parquet file
> (using gzip compression).
> So, if I have to make multiple parquet files, what block size would be
> optimal, if I have to read the file later?
>
> On 09-Jun-2017 11:28 PM, "Kunal Khatua" 
> <kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:
>
>
> If you're storing this in S3... you might want to selectively read the
> files as well.
>
>
> I'm only speculating, but if you want to download the data, downloading as
> a queue of files might be more reliable than one massive file. Similarly,
> within AWS, it *might* be faster to have an EC2 instance access a couple of
> large Parquet files versus one massive Parquet file.
>
>
> Remember that when you create a large block size, Drill tries to write
> everything within a single row group for each. So there is no chance of
> parallelization of the read (i.e. reading parts in parallel). The defaults
> should work well for S3 as well, and with the compression (e.g. Snappy),
> you should get a reasonably smaller file size.
>
>
> With the current default settings... have you seen what Parquet file sizes
> you get with Drill when converting your 10GB CSV source files?
>
>
> 
> From: Shuporno Choudhury 
> <shuporno.choudh...@manthan.com<mailto:shuporno.choudh...@manthan.com>>
> Sent: Friday, June 9, 2017 10:50:06 AM
> To: user@drill.apache.org<mailto:user@drill.apache.org>
> Subject: Re: Increasing store.parquet.block-size
>
> Thanks Kunal for your insight.
> I am actually converting some .csv files and storing them in parquet format
> in s3, not in HDFS.
> The size of the individual .csv source files can be quite huge (around
> 10GB).
> So, is there a way to overcome this and create one parquet file or do I
> have to go ahead with multiple parquet files?
>
> On 09-Jun-2017 11:04 PM, "Kunal Khatua" 
> <kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:
>
> Shuporno
>
>
> There are some interesting problems when using Parquet files > 2GB on
> HDFS.
>
>
> If I'm not mistaken, the HDFS APIs that allow you to read offsets (oddly
> enough) returns an int value. Large Parquet blocksize also means you'll
> end
> up having the file span across multiple HDFS blocks, and that would make
> reading of rowgroups inefficient.
>
>
> Is there a reason you want to create such a large parquet file?
>
>
> ~ Kunal
>
> 
> From: Vitalii Dir

Re: Increasing store.parquet.block-size

2017-06-14 Thread Khurram Faraaz
Thanks Padma. There are some more related failures reported in DRILL-2478, do 
you think we should fix them too, if it is an easy fix.


Regards,

Khurram


From: Padma Penumarthy <ppenumar...@mapr.com>
Sent: Wednesday, June 14, 2017 11:43:16 AM
To: user@drill.apache.org
Subject: Re: Increasing store.parquet.block-size

I think you meant MB (not GB) below.
HDFS allows creation of very large files(theoretically, there is no limit).
I am wondering why >2GB file is a problem. May be it is blockSize >2GB, that is 
not recommended.

Anyways, we should not let the user be able to set any value and later throw an 
error.
I opened a PR to fix this.
https://github.com/apache/drill/pull/852

Thanks,
Padma


On Jun 9, 2017, at 11:36 AM, Kunal Khatua 
<kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:

The ideal size depends on what engine is consuming the parquet files (Drill, 
i'm guessing) and the storage layer. For HDFS, which is usually 128-256GB, 
we recommend to bump it to about 512GB (with the underlying HDFS blocksize to 
match that).


You'll probably need to experiment a little with different blocks sizes stored 
on S3 to see which works the best.

<http://www.mapr.com/>


From: Shuporno Choudhury 
<shuporno.choudh...@manthan.com<mailto:shuporno.choudh...@manthan.com>>
Sent: Friday, June 9, 2017 11:23:37 AM
To: user@drill.apache.org<mailto:user@drill.apache.org>
Subject: Re: Increasing store.parquet.block-size

Thanks for the information Kunal.
After the conversion, the file size scales down to half if I use gzip
compression.
For a 10 GB gzipped csv source file, it becomes 5GB (2+2+1) parquet file
(using gzip compression).
So, if I have to make multiple parquet files, what block size would be
optimal, if I have to read the file later?

On 09-Jun-2017 11:28 PM, "Kunal Khatua" 
<kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:


If you're storing this in S3... you might want to selectively read the
files as well.


I'm only speculating, but if you want to download the data, downloading as
a queue of files might be more reliable than one massive file. Similarly,
within AWS, it *might* be faster to have an EC2 instance access a couple of
large Parquet files versus one massive Parquet file.


Remember that when you create a large block size, Drill tries to write
everything within a single row group for each. So there is no chance of
parallelization of the read (i.e. reading parts in parallel). The defaults
should work well for S3 as well, and with the compression (e.g. Snappy),
you should get a reasonably smaller file size.


With the current default settings... have you seen what Parquet file sizes
you get with Drill when converting your 10GB CSV source files?



From: Shuporno Choudhury 
<shuporno.choudh...@manthan.com<mailto:shuporno.choudh...@manthan.com>>
Sent: Friday, June 9, 2017 10:50:06 AM
To: user@drill.apache.org<mailto:user@drill.apache.org>
Subject: Re: Increasing store.parquet.block-size

Thanks Kunal for your insight.
I am actually converting some .csv files and storing them in parquet format
in s3, not in HDFS.
The size of the individual .csv source files can be quite huge (around
10GB).
So, is there a way to overcome this and create one parquet file or do I
have to go ahead with multiple parquet files?

On 09-Jun-2017 11:04 PM, "Kunal Khatua" 
<kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:

Shuporno


There are some interesting problems when using Parquet files > 2GB on
HDFS.


If I'm not mistaken, the HDFS APIs that allow you to read offsets (oddly
enough) returns an int value. Large Parquet blocksize also means you'll
end
up having the file span across multiple HDFS blocks, and that would make
reading of rowgroups inefficient.


Is there a reason you want to create such a large parquet file?


~ Kunal


From: Vitalii Diravka 
<vitalii.dira...@gmail.com<mailto:vitalii.dira...@gmail.com>>
Sent: Friday, June 9, 2017 4:49:02 AM
To: user@drill.apache.org<mailto:user@drill.apache.org>
Subject: Re: Increasing store.parquet.block-size

Khurram,

DRILL-2478 is a good place holder for the LongValidator issue, it really
works wrong.

But other issue connected to impossibility to use long values for parquet
block-size.
This issue can be independent task or a sub-task of updating Drill
project
to a latest parquet library.

Kind regards
Vitalii

On Fri, Jun 9, 2017 at 10:25 AM, Khurram Faraaz 
<kfar...@mapr.com<mailto:kfar...@mapr.com>>
wrote:

 1.  DRILL-2478<https://issues.apache.org/jira/browse/DRILL-2478> is
Open for this issue.
 2.  I have added more details into the comments.

Thanks,
Khurram


From: Shuporno Choudhury 
<shuporno.choudh...@manthan.com<mailto:shuporno.choudh..

Re: Increasing store.parquet.block-size

2017-06-09 Thread Khurram Faraaz
  1.  DRILL-2478 is Open for 
this issue.
  2.  I have added more details into the comments.

Thanks,
Khurram


From: Shuporno Choudhury 
Sent: Friday, June 9, 2017 12:48:41 PM
To: user@drill.apache.org
Subject: Increasing store.parquet.block-size

The max value that can be assigned to *store.parquet.block-size *is
*2147483647*, as the value kind of this configuration parameter is LONG.
This basically translates to 2GB of block size.
How do I increase it to 3/4/5 GB ?
Trying to set this parameter to a higher value using the following command
actually succeeds :
ALTER SYSTEM SET `store.parquet.block-size` = 4294967296;
But when I try to run a query that uses this config, it throws the
following error:
   Error: SYSTEM ERROR: NumberFormatException: For input string:
"4294967296"
So, is it possible to assign a higher value to this parameter?
--
Regards,
Shuporno Choudhury


Re: Drill query are stuck in ENQUEUED mode

2017-05-03 Thread Khurram Faraaz
Does your query execute and complete when you set exec.queue.enable = false ?

The default is to set exec.queue.enable to false.


Thanks,

Khurram


From: jasbir.s...@accenture.com 
Sent: Wednesday, May 3, 2017 5:58:48 PM
To: user@drill.apache.org
Cc: maneesh.koth...@accenture.com
Subject: Drill query are stuck in ENQUEUED mode

Hi,

I am having queries in which I am fetching just 1 row from the parquet file 
using LIMIT  1, and even these queries are stuck in ENQUEUED state in DRILL.

I am using exec.queue.enable = true and have default settings for the rest.

Can you help me out in this.

Regards,
Jasbir Singh



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Re: Fetch temporary tables data of PostgreSQL from Apache Drill

2017-04-18 Thread Khurram Faraaz
Temporary tables are session specific, how are you trying to access them from 
Drill ?


If the temporary table was created in Postgres in a different session, you can 
not access that temporary table from Drill using another session. Temporary 
tables get dropped once session terminates.


Thanks,

Khurram


From: jasbir.s...@accenture.com 
Sent: Tuesday, April 18, 2017 2:32:51 PM
To: user@drill.apache.org
Subject: Fetch temporary tables data of PostgreSQL from Apache Drill

Hi,

I am able to fetch normal tables data from PostgreSQL using Apache Drill but 
not able to get the data of temporary tables.

Is there any way how to fetch data from temporary tables?

Regards,
Jasbir Singh



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Re: Data read error while query Hbase table.

2017-04-09 Thread Khurram Faraaz
Can you please share the stack trace from drillbit.log, and what version of 
Drill are you on ?


Thanks,

Khurram


From: nagesh bigdata 
Sent: Monday, April 10, 2017 8:34:59 AM
To: user@drill.apache.org
Subject: Data read error while query Hbase table.

Hi ,

I just executed a query "select * from Hbase.`namespace.table` LIMIT 100"
in Apache Drill Web UI.

>From the logs, HBase connect ion is created. After 8 mins, I got DATA READ
ERROR : failure while loading table in database hbase.

Please suggest.

Thanks.


Re: Wrong property value

2017-03-27 Thread Khurram Faraaz
We should report a documentation JIRA to track this.


From: Padma Penumarthy 
Sent: Monday, March 27, 2017 7:21:02 PM
To: user@drill.apache.org
Subject: Re: Wrong property value

Yes, you are right. We need to update the documentation with
the correct option name.  Thanks for bringing it up.

Thanks,
Padma


> On Mar 27, 2017, at 1:50 AM, Muhammad Gelbana  wrote:
>
> According to this page
> , Drill can
> implicitly interprets the INT96 timestamp data type in Parquet files after
> setting the *store.parquet.int96_as_timestamp* option to *true*.
>
> I believe the option name should be
> *store.parquet.reader.int96_as_timestamp*
>
> Or did I miss something ?
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana



Re: Display of query result using command line

2017-03-15 Thread Khurram Faraaz
Three million rows is too many rows, for sqlline to print.

Try doing a COUNT(*) and see if that query returns the correct count on that 
table.


Thanks,

Khurram


From: PROJJWAL SAHA 
Sent: Wednesday, March 15, 2017 7:41:00 PM
To: user@drill.apache.org
Subject: Display of query result using command line

All,

I am using drillconf from command line to display a query result like
select * from xxx
having 3 million rows. The screen display scrolls fast to display the
result, however, it stops after some time with this exception -

java.lang.NegativeArraySizeException
at
org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:440)
at
org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:128)
at
org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:149)
at
org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getString(BoundCheckingAccessor.java:124)
at
org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getString(TypeConvertingSqlAccessor.java:649)
at
org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getString(AvaticaDrillSqlAccessor.java:94)
at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:352)
at
org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:434)
at sqlline.Rows$Row.(Rows.java:157)
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
at
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
at sqlline.SqlLine.print(SqlLine.java:1593)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:746)
at sqlline.SqlLine.begin(SqlLine.java:621)
at sqlline.SqlLine.start(SqlLine.java:375)
at sqlline.SqlLine.main(SqlLine.java:268)


The query shows completed in the profile.

Any reason/suggestions on this ?


Regards,

Projjwal


Re: Drill 1.9 Null pointer Exception

2017-03-02 Thread Khurram Faraaz
Can you please share your query and the type of data over which the query is 
executed ?


From: Anas A <a...@trinitymobility.com>
Sent: Thursday, March 2, 2017 2:18:32 PM
To: user@drill.apache.org
Cc: prasann...@trinitymobility.com; 'Sushil'
Subject: RE: Drill 1.9 Null pointer Exception

Hi Khurram,
Thanks for your response. The HBase and hive version is not changed , we
only updated the drill version to 1.9 . our requirement is to work with
Spatial queries which is supported from 1.9 .  is there any way to fix the
issue.

Thanks & Regards
Anas A,
Trinity Mobility Pvt. Ltd | Bangalore | +91-7736368236




-Original Message-----
From: Khurram Faraaz [mailto:kfar...@mapr.com]
Sent: 02 March 2017 14:02
To: user@drill.apache.org
Cc: prasann...@trinitymobility.com; 'Sushil' <sus...@trinitymobility.com>
Subject: Re: Drill 1.9 Null pointer Exception

Hi Anas,


Not sure what is causing the NPE, is your HBase version same as before ?


This assertion that you see in the stack trace below, was recently fixed in
DRILL-5040, you may want to try to latest available build Drill 1.10.0


Caused by: java.lang.AssertionError: Internal error: Error while applying
rule DrillPushProjIntoScan, args
[rel#35532:LogicalProject.NONE.ANY([]).[](input=rel#35531:Subset#0.ENUMERABL
E.ANY([]).[],$f0=0),
rel#35516:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs.tmp,
bfe2dad0-921a-4f06-9799-494ab8a7246d/851a124c-80a1-45e3-9496-d2562007911e])]
at org.apache.calcite.util.Util.newInternal(Util.java:792)
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java
:251) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.ja
va:808) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(Defau
ltSqlHandler.java:404)
~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]



Thanks,

Khurram


From: Anas A <a...@trinitymobility.com>
Sent: Thursday, March 2, 2017 1:07:46 PM
To: user@drill.apache.org
Cc: prasann...@trinitymobility.com; 'Sushil'
Subject: Drill 1.9 Null pointer Exception

Hi,
I am trying to work with Apache drill 1.9 to access HBase and Hive tables am
getting a nullpointer Exception. The same table I queried using drill 1.8
and it worked fine without any issues. Attaching the Error .Please suggest.



ERROR :

select * from twitter_test_nlp limit 10;
Error: SYSTEM ERROR: NullPointerException


[Error Id: 8c747c22-4f7f-4ba7-b30c-cb5fb3614a41 on
master01.trinitymobility.local:31010]

  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception
during fragment initialization: Internal error: Error while applying rule
DrillPushProjIntoScan, args
[rel#980:LogicalProject.NONE.ANY([]).[](input=rel#979:Subset#0.ENUMERABLE.AN
Y([]).[],id=$0,dates=$1,times=$2,time_zone=$3,users=$4,profile_image_url=$5,
latitude=$6,longitude=$7,twitter_handle=$8,sentiment=$9,language=$10,lang_pr
obability=$11,text=$12),
rel#964:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hive, social,
twitter_test_nlp])]
org.apache.drill.exec.work.foreman.Foreman.run():281
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.lang.AssertionError) Internal error: Error while applying
rule DrillPushProjIntoScan, args
[rel#980:LogicalProject.NONE.ANY([]).[](input=rel#979:Subset#0.ENUMERABLE.AN
Y([]).[],id=$0,dates=$1,times=$2,time_zone=$3,users=$4,profile_image_url=$5,
latitude=$6,longitude=$7,twitter_handle=$8,sentiment=$9,language=$10,lang_pr
obability=$11,text=$12),
rel#964:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hive, social,
twitter_test_nlp])]
org.apache.calcite.util.Util.newInternal():792
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():808
org.apache.calcite.tools.Programs$RuleSetProgram.run():303

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():404

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():343

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel()
:240

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel()
:290

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():123
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():97
org.apache.drill.exec.work.foreman.Foreman.runSQL():1008
org.apache.drill.exec.work.for

Re: How to enable decimal value

2017-02-08 Thread Khurram Faraaz
Can you share your query and the results ?

And did you set planner.enable_decimal_data_type=true at session level or 
system level ?


From: Sanjiv Kumar 
Sent: Wednesday, February 8, 2017 3:15:08 PM
To: user@drill.apache.org
Subject: How to enable decimal value

Hello I have decimal value in my table. When query through drill data are
rounded off in output.

I have enable planner.enable_decimal_data_type = true in system options.
But still data are not showing in decimal form.

Drill Environment:- OS:- Window 10
   Version:- 1.9
   Mode:- embedded mode

--
Thanks & Regards.
Sanjiv Kumar.


Re: How to use CASE in apache drill

2017-01-20 Thread Khurram Faraaz
Are the two columns CS.ProviderID and b.`Value` of the same datatypes in your 
SQL statement ?

From: Sanjiv Kumar 
Sent: Friday, January 20, 2017 1:59:35 PM
To: user@drill.apache.org
Subject: How to use CASE in apache drill

I Have one doubt regarding CASE in Apache Drill.

I am trying to use CASE in latest version of drill (1.9) in Window system &
in embedded mode.

SEE MY QUERY:-

SELECT
CASE WHEN
EXISTS
(SELECT distinct CS.ProviderID
 FROM ... AS `CS`
INNER JOIN ... AS `PS`
ON PS.ID = CS.ProviderID
where CS.PluginName = 'some text from db' AND
PS.Code = 'SQLSERVER' AND CS.AccessKey = 'B2S1' limit 1)
THEN
(SELECT distinct b.`Value` from
... AS a
inner join ... as b
 on a.ID = b.PermissionID
inner join ... as c
 on c.RoleID=b.RoleID and c.ModuleID=1 limit 1)
ELSE 0
END FROM  ...;


I am trying to execute this query but it throughing error.

SEE ERROR:-

org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR:
>From line 2, column 1 to line 7, column 4: Illegal mixing of types in CASE
or COALESCE statement SQL Query null

*What i am doing wrong in this query.?*


Re: Drill with Parquet

2016-12-29 Thread Khurram Faraaz
Please look at these examples on the documentation links below

here is the link to supported datatypes in Drill -
https://drill.apache.org/docs/supported-data-types/

and link to CTAS in Drill -
http://drill.apache.org/docs/create-table-as-ctas-command/

On Thu, Dec 29, 2016 at 4:40 PM, Manjeet Singh 
wrote:

> Hi
>
> I am trying below query
>
> USE dfs;
> CREATE table dfs.`view_name`(AAA String,
>   Domain String,
>   certValidity String,
>   protocol String,
>   LastActive String,
>   GetCount DOUBLE,
>   PostCount DOUBLE,
>   Data_Transfer DOUBLE,
>   Data_Receive DOUBLE,
>   Total_Communication DOUBLE,
>   Last_Active String)AS SELECT AAA String,
>   Domain String,
>   certValidity String,
>   protocol String,
>   LastActive String,
>   GetCount DOUBLE,
>   PostCount DOUBLE,
>   Data_Transfer DOUBLE,
>   Data_Receive DOUBLE,
>   Total_Communication DOUBLE,
>   Last_Active String FROM
> dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;
>
>
>
> still getting error
>
> can anyone suggest me what I am doing wrong
> second can anyone share how to create table over parquet file which is on
> hdfs?
>
> On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning 
> wrote:
>
> > On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <
> manjeet.chand...@gmail.com
> > >
> > wrote:
> >
> > > CREATE TABLE 't'(
> > > "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> > > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > > "Total_Communication", "Last_Active" FROM
> > > dfs."/user/Domain-1-_1481273732716/*";
> > >
> >
> > You aren't specifying where your new table will be created. THis is
> > probably in dfs. You may also need to provide a workspace.  For instance,
> > when I was creating a file this morning I used this:
> >
> > create table maprfs.ted.fooble as ...
> >
> > The parts of this table name are:
> >
> > maprfs - the storage
> >
> > ted - the workspace (aka my home directory)
> >
> > fooble - the table (no need for quotes for some kinds of names ...
> commonly
> > necessary, however)
> >
> > You have to make sure that your workspace allows writes.
> >
>
>
>
> --
> luv all
>


Re: Limit ALL not working with offset

2016-12-27 Thread Khurram Faraaz
   1. DRILL-5165 <https://issues.apache.org/jira/browse/DRILL-5165> is
   reported to track this issue.


On Tue, Dec 27, 2016 at 2:06 PM, Khurram Faraaz <kfar...@maprtech.com>
wrote:

> This looks like a bug. Thanks for reporting this issue.
> I tried a similar query on apache Drill 1.10.0 and Drill returns wrong
> results when compared to Postgres, for a query that uses LIMIT and OFFSET
> clause in the same query. We need to file a JIRA to track this issue.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1
> limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.211 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by
> col_int limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.24 seconds)
> {noformat}
>
> Query => select col_int from typeall_l limit all offset 10;
> Drill 1.10.0 returns 85 rows
>
> whereas for same query,
> postgres=# select col_int from typeall_l limit all offset 10;
> Postgres 9.3 returns 95 rows, which is the correct expected result.
>
> Query plan for above query that returns wrong results
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from
> typeall_l limit all offset 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col_int=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(offset=[10])
> 00-04Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/typeall_l]],
> selectionRoot=maprfs:/tmp/typeall_l, numFiles=1, usedMetadataFile=false,
> columns=[`col_int`]]])
> {noformat}
>
> On Tue, Dec 27, 2016 at 1:49 PM, Sanjiv Kumar <ktrsan...@gmail.com> wrote:
>
>> Hello
>>  I trying to use limit all with offset, but it not working.
>> My query:-
>>select c.CategoryID  from demoPlugin.dbo.Category as c
>> group
>> by c.CategoryID order by 1 desc limit ALL offset 10;
>>
>> While firing this query its showing:- "No result found."
>>
>> But if i remove offset 10, then it working fine.
>> select c.CategoryID  from demoPlugin.dbo.Category as c group by
>> c.CategoryID order by 1 desc limit ALL;
>>
>> What is the problem.? Is limit all is not supported with offset?
>>
>>
>> ---
>> Thanks & Regards
>>Sanjiv Kumar
>>
>
>


Re: Limit ALL not working with offset

2016-12-27 Thread Khurram Faraaz
This looks like a bug. Thanks for reporting this issue.
I tried a similar query on apache Drill 1.10.0 and Drill returns wrong
results when compared to Postgres, for a query that uses LIMIT and OFFSET
clause in the same query. We need to file a JIRA to track this issue.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1
limit all offset 10;
+--+
| col_int  |
+--+
+--+
No rows selected (0.211 seconds)
0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by
col_int limit all offset 10;
+--+
| col_int  |
+--+
+--+
No rows selected (0.24 seconds)
{noformat}

Query => select col_int from typeall_l limit all offset 10;
Drill 1.10.0 returns 85 rows

whereas for same query,
postgres=# select col_int from typeall_l limit all offset 10;
Postgres 9.3 returns 95 rows, which is the correct expected result.

Query plan for above query that returns wrong results

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from
typeall_l limit all offset 10;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(col_int=[$0])
00-02SelectionVectorRemover
00-03  Limit(offset=[10])
00-04Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tmp/typeall_l]],
selectionRoot=maprfs:/tmp/typeall_l, numFiles=1, usedMetadataFile=false,
columns=[`col_int`]]])
{noformat}

On Tue, Dec 27, 2016 at 1:49 PM, Sanjiv Kumar  wrote:

> Hello
>  I trying to use limit all with offset, but it not working.
> My query:-
>select c.CategoryID  from demoPlugin.dbo.Category as c group
> by c.CategoryID order by 1 desc limit ALL offset 10;
>
> While firing this query its showing:- "No result found."
>
> But if i remove offset 10, then it working fine.
> select c.CategoryID  from demoPlugin.dbo.Category as c group by
> c.CategoryID order by 1 desc limit ALL;
>
> What is the problem.? Is limit all is not supported with offset?
>
>
> ---
> Thanks & Regards
>Sanjiv Kumar
>


Re: Drill with Parquet

2016-12-26 Thread Khurram Faraaz
Hello Manjeet,

What error do you see on the prompt from where you submit the CTAS
statement ?
Can you please share the stacktrace/error information from the drillbit.log
file ?
is your CTAS (the one that failed), is it over CSV/JSON files ?

And when you say your SELECT statement is not working, what do you see on
the prompt, an error/Exception, no results ?
Can you also share the schema details for your parquet file (*
02a566d.snappy.parquet) in your SELECT statement ?

What version of Drill are you using ?

On Mon, Dec 26, 2016 at 12:16 PM, Manjeet Singh 
wrote:

> Hi all
>
> I have checked that even my simplest query is not working
>
> SELECT AAA FROM
> dfs.'/user/olap/EntityProfiling/DomainParquet/
> Domain-1-_1481273732716/part-r-0-7c4b50c5-0318-4243-
> 9f76-9822c02a566d.snappy.parquet';
>
>
> Thanks
> Manjeet
>
> On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <
> manjeet.chand...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I have query, I want to create table/ view on drill I have data stored in
> > parquet files on hdfs
> >
> >
> > i am using below command but not able to create table
> >
> >
> > CREATE TABLE 't'(
> > "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> > 1481273732716/*";
> >
> > below is the schema
> >
> > -- AAA: string (nullable = true)
> > |-- BTSID: string (nullable = true)
> > |-- OIME: string (nullable = true)
> >|-- OIMS: string (nullable = true)
> > |-- application: string (nullable = true)
> > |-- applicationCount: string (nullable = true)
> > |-- dataRx: string (nullable = true)
> > |-- dataTx: string (nullable = true)
> > |-- day: string (nullable = true)
> > |-- duration: string (nullable = true)
> > |-- locationLatLong: string (nullable = true)
> > |-- month: string (nullable = true)
> > |-- protocol: string (nullable = true)
> > |-- protocolCount: string (nullable = true)
> > |-- serverIP: string (nullable = true)
> > |-- startTime: string (nullable = true)
> > |-- totalVolume: string (nullable = true)
> > |-- year: string (nullable = true)
> >
> > can anyone help me out?
> >
> > Thanks
> > Manjeet
> >
> > --
> > luv all
> >
>
>
>
> --
> luv all
>


Re: hangout starting in a bit

2016-12-13 Thread Khurram Faraaz
Can we please have the link to the hangout ?

On Tue, Dec 13, 2016 at 11:32 PM, Parth Chandra  wrote:

>
>


Re: Help in running drill jdbc code from tomcat server.

2016-12-12 Thread Khurram Faraaz
Can you share the jdbc connection string and the error you see in the
drillbit.log file ?

On Mon, Dec 12, 2016 at 2:58 PM,  wrote:

> Hi,
>
>
> I am getting an error when I am invoking drill jdbc code from apache
> tomcat server. The same piece of code is executing fine while connecting it
> from standalone java class. Can u help me with this issue?
>
>
>
> Thanks and regards
>
>
> Vittal Surya Lakshya
>
> Bangalore
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>


Re: [1.9.0] : Error: SYSTEM ERROR: MalformedInputException: Input length = 1

2016-12-07 Thread Khurram Faraaz
Is your data the same on 1.6.0 and 1.9.0, or did your data change by any
chance when/after upgrading from 1.6.0 to 1.9.0 ?

On Wed, Dec 7, 2016 at 12:22 PM, Pratik Khedkar <
pratik.khed...@games24x7.com> wrote:

> Hi Team,
>
> I am getting below error after upgrading drill from 1.6 to 1.9.
> Working fine in 1.6. Kindly help me in this.
>
> *Query* :-
> create table a_tt1_new as  select sessionid,userid,dchannelid as
> channelid,id,event,hreferer as url_ref,case when (strpos(url,'?')-1) <>
> '-1' then LandingPage(substr(url, 1, strpos(url,'?')-1)) else url end as
> url ,timestamp_url,timestamp_url_client,update_date
> from gllva_dump where ((lp not like 'NA' and dchannelid <> 3) OR event like
> 'notification%' OR (event = 'e.a' and url like '
> https://abc/bring-f/xy.html%'
> and ((Base64Conv(ar) like '%message%success%' and
> Base64Conv(ar) like '%bonusearned%' ) or (Base64Conv(ar) like
> '%inviteSent%true%' and Base64Conv(ar) like
> '%tyou.html%'))) or (id = 'btn_d' and event = 'g.c') OR (id like 'plner%'
> or id like 'mob-banner%' or id like 'player%' or id like '%banner%') OR
> (url like '%pro/pro.html%' or url like '%pro/pro.html%') OR ((event =
> 'e.m'  and id = 'rdCTA') OR (id = 'rdClicked' and event = 'rdClicked') OR
> (event = 'g.c'  and id like 'rewar_tophead%')) OR ((event ='g.c' and (id =
> 'reg_m' OR id = 'reg_f')) OR (id = '/reg/regff.html')));
>
> *Stack Trace* :-
>
> Error: SYSTEM ERROR: MalformedInputException: Input length = 1
>
> Fragment 1:3
>
> [Error Id: ebc15806-831b-47ab-8b73-e79206b065a2 on datanode2:31010]
> (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: MalformedInputException: Input length
> = 1
>
> Fragment 1:3
>
> [Error Id: ebc15806-831b-47ab-8b73-e79206b065a2 on datanode2:31010]
> at
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(
> DrillCursor.java:232)
> at
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(
> DrillCursor.java:275)
> at
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> DrillResultSetImpl.java:1943)
> at
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> DrillResultSetImpl.java:76)
> at
> org.apache.calcite.avatica.AvaticaConnection$1.execute(
> AvaticaConnection.java:473)
> at
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(
> DrillMetaImpl.java:465)
> at
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(
> AvaticaConnection.java:477)
> at
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(
> DrillConnectionImpl.java:169)
> at
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> AvaticaStatement.java:109)
> at
> org.apache.calcite.avatica.AvaticaStatement.execute(
> AvaticaStatement.java:121)
> at
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(
> DrillStatementImpl.java:101)
> at sqlline.Commands.execute(Commands.java:841)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:746)
> at sqlline.SqlLine.runCommands(SqlLine.java:1651)
> at sqlline.Commands.run(Commands.java:1304)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
> at sqlline.SqlLine.dispatch(SqlLine.java:742)
> at sqlline.SqlLine.initArgs(SqlLine.java:553)
> at sqlline.SqlLine.begin(SqlLine.java:596)
> at sqlline.SqlLine.start(SqlLine.java:375)
> at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> ERROR: MalformedInputException: Input length = 1
>
> Fragment 1:3
>
> [Error Id: ebc15806-831b-47ab-8b73-e79206b065a2 on datanode2:31010]
> at
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(
> QueryResultHandler.java:123)
> at
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(
> UserClient.java:144)
> at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(
> BasicClientWithConnection.java:46)
> at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(
> BasicClientWithConnection.java:31)
> at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65)
> at
> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363)
> at
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(
> SerializedExecutor.java:89)
> at
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240)
> at
> org.apache.drill.common.SerializedExecutor.execute(
> SerializedExecutor.java:123)
> at
> 

Re: Last Column showing blank in csv file

2016-12-03 Thread Khurram Faraaz
I am not exactly sure this will work (I haven't tried it on Windows 10)
Setting lineDelimiter to '\r\n' should try to help in your case, can you
please try. For details look at DRILL-3149


Example:

select * from table(dfs.`my_table`(type=>'text', 'lineDelimiter'=>'\r\n'))


On Sat, Dec 3, 2016 at 2:28 AM, Kunal Khatua  wrote:

> There is a dos2unix utility for Linux that allows you to substitute the
> multichar newline with the single char newline.
>
> For Windows, you can use either a similar util on SourceForge or the
> CygUtils (part of the Cygwin shell, I believe) to achieve the same
> conversion.
>
> In the meanwhile, like Abhishek suggested, please file a JIRA for this so
> that the issue can be tracked and fixed in a future release.
>
> Thanks
> Kunal
>
> On Wed 30-Nov-2016 8:24:00 PM, Abhishek Girish 
> wrote:
> Well the only workaround I got to work is this: I saved the file in
> UNIX/OS_X Format and executed the query successfully (with no blanks in the
> class column).
>
> Some text editors such as Notepad++ let you do this directly on Windows. Or
> you could use a utility like iconv or dos2unix for larger files like Leon
> suggested.
>
> Also, please file an enhancement JIRA for supporting a new 'newline'
> attribute (which takes multi-byte values) for text formats. This will be
> helpful for files on windows which use CRLF (\r\n) as newline characters.
>
> On Wed, Nov 30, 2016 at 7:56 PM, Abhishek Girish
> wrote:
>
> > Okay, I could reproduce the issue on Windows.
> >
> > 0: jdbc:drill:zk=local> select A.sepalen, A.sepalwidth, A.patelen,
> > A.patelwidth,
> > A.class from dfs.`/drill/tmp.csv` as A;
> > +--+-+--+--++
> > | sepalen | sepalwidth | patelen | patelwidth | class |
> > +--+-+--+--++
> > | 5.1 | 3.5 | 1.4 | Iris-setosa | |
> > | 4.9 | 3 | 1.4 | Iris-setosa | |
> > | 4.7 | 3.2 | 1.3 | Iris-setosa | |
> > | 4.6 | 3.1 | 1.5 | Iris-setosa | |
> > | 5 | 3.6 | 1.4 | Iris-setosa | |
> > | 5.4 | 3.9 | 1.7 | Iris-setosa | |
> > | 4.6 | 3.4 | 1.4 | Iris-setosa | |
> > | 5 | 3.4 | 1.5 | Iris-setosa | |
> > | 4.4 | 2.9 | 1.4 | Iris-setosa | |
> > | 4.9 | 3.1 | 1.5 | Iris-setosa | |
> > | 5.4 | 3.7 | 1.5 | Iris-setosa | |
> > | 4.8 | 3.4 | 1.6 | Iris-setosa | |
> > +--+-+--+--++
> > 12 rows selected (0.277 seconds)
> >
> > I'll get back if and once I have a workaround / solution.
> >
> > On Wed, Nov 30, 2016 at 7:36 PM, Abhishek Girish
> > abhishek.gir...@gmail.com> wrote:
> >
> >> I forgot to mention that, the other query works too.
> >>
> >> > select A.sepalen, A.sepalwidth, A.patelen, A.patelwidth, A.class from
> >> dfs.tmp.`tmp.csv` as A;
> >> +--+-+--+--++
> >> | sepalen | sepalwidth | patelen | patelwidth | class |
> >> +--+-+--+--++
> >> | 5.1 | 3.5 | 1.4 | Iris-setosa | 0.2 |
> >> | 4.9 | 3 | 1.4 | Iris-setosa | 0.2 |
> >> | 4.7 | 3.2 | 1.3 | Iris-setosa | 0.2 |
> >> | 4.6 | 3.1 | 1.5 | Iris-setosa | 0.2 |
> >> | 5 | 3.6 | 1.4 | Iris-setosa | 0.2 |
> >> | 5.4 | 3.9 | 1.7 | Iris-setosa | 0.4 |
> >> | 4.6 | 3.4 | 1.4 | Iris-setosa | 0.3 |
> >> | 5 | 3.4 | 1.5 | Iris-setosa | 0.2 |
> >> | 4.4 | 2.9 | 1.4 | Iris-setosa | 0.2 |
> >> | 4.9 | 3.1 | 1.5 | Iris-setosa | 0.1 |
> >> | 5.4 | 3.7 | 1.5 | Iris-setosa | 0.2 |
> >> | 4.8 | 3.4 | 1.6 | Iris-setosa | 0.2 |
> >> +--+-+--+--++
> >> 12 rows selected (0.31 seconds)
> >>
> >>
> >> On Wed, Nov 30, 2016 at 8:15 AM, Leon Clayton
> >> wrote:
> >>
> >>> Are we sure there is no hidden characters on the end of the one or more
> >>> lines. thinking ?
> >>>
> >>> try a dos2unix on the file to check this theory out.
> >>>
> >>>
> >>> > On 30 Nov 2016, at 10:45, Sanjiv Kumar wrote:
> >>> >
> >>> > Hello
> >>> > Yes you are right. select * from `tmp.csv`
> >>> > Is Working fine, but if select column then the last column data
> showing
> >>> > blank.
> >>> > Run this query:-
> >>> > select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
> >>> A.class
> >>> > from dfs.tmp.`copydata.csv` as A;
> >>> >
> >>> > If you run this query you will get last column data as blank.
> >>> >
> >>> > On Wed, Nov 30, 2016 at 11:15 AM, Sanjiv Kumar
> >>> wrote:
> >>> >
> >>> >> I am using latest version 1.8 and in window 10 operating system.
> >>> >>
> >>> >> On Tue, Nov 29, 2016 at 11:40 AM, Sanjiv Kumar
> >>> >> wrote:
> >>> >>
> >>> >>> I already pasted the csv file. Just copy and save it as csv. I am
> >>> also
> >>> >>> attaching csv file. And the query is same as i mention above.
> >>> >>>
> >>> >>> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
> >>> A.class
> >>> >>> from dfs.tmp.`copydata.csv` as A;
> >>> >>>
> >>> >>> On Mon, Nov 28, 2016 at 6:30 PM, Sanjiv 

Re: Last Column showing blank in csv file

2016-11-29 Thread Khurram Faraaz
Thanks. What version of Drill are you using ?  And is this on Linux ?

On Tue, Nov 29, 2016 at 11:40 AM, Sanjiv Kumar  wrote:

> I already pasted the csv file. Just copy and save it as csv. I am also
> attaching csv file. And the query is same as i mention above.
>
> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`, A.class
> from dfs.tmp.`copydata.csv` as A;
>
> On Mon, Nov 28, 2016 at 6:30 PM, Sanjiv Kumar  wrote:
>
>>  Ya Its working but what if i am using :-
>>   select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
>> A.class from dfs.tmp.`copydata.csv` as A;
>>
>> why my last column data showing blank?
>> And One more thing See my csv file:-
>>
>> sepalen,sepalwidth,patelen,patelwidth,class
>> 5.1,3.5,1.4,Iris-setosa,0.2
>> 4.9,3,1.4,Iris-setosa,0.2
>> 4.7,3.2,1.3,Iris-setosa,0.2
>> 4.6,3.1,1.5,Iris-setosa,0.2
>> 5,3.6,1.4,Iris-setosa,0.2
>> 5.4,3.9,1.7,Iris-setosa,0.4
>> 4.6,3.4,1.4,Iris-setosa,0.3
>> 5,3.4,1.5,Iris-setosa,0.2
>> 4.4,2.9,1.4,Iris-setosa,0.2
>> 4.9,3.1,1.5,Iris-setosa,0.1
>> 5.4,3.7,1.5,Iris-setosa,0.2
>> 4.8,3.4,1.6,Iris-setosa,0.2
>>
>> This is my previous file.
>> Now if i add comma after class Check this new file.
>>
>>  sepalen,sepalwidth,patelen,patelwidth,class,
>> 5.1,3.5,1.4,Iris-setosa,0.2
>> 4.9,3,1.4,Iris-setosa,0.2
>> 4.7,3.2,1.3,Iris-setosa,0.2
>> 4.6,3.1,1.5,Iris-setosa,0.2
>> 5,3.6,1.4,Iris-setosa,0.2
>> 5.4,3.9,1.7,Iris-setosa,0.4
>> 4.6,3.4,1.4,Iris-setosa,0.3
>> 5,3.4,1.5,Iris-setosa,0.2
>> 4.4,2.9,1.4,Iris-setosa,0.2
>> 4.9,3.1,1.5,Iris-setosa,0.1
>> 5.4,3.7,1.5,Iris-setosa,0.2
>> 4.8,3.4,1.6,Iris-setosa,0.2
>>
>>
>> And Fire this query:- select A.`sepalen`, A.`sepalwidth`, A.`patelen`,
>> A.`patelwidth`, A.class from dfs.tmp.`copydata.csv` as A;
>>
>> then the output is show fine. but if the comma is not there in then the
>> last column data  showing blank.
>>
>> Is this a bug ??
>>  ..
>>   Thanks & Regards
>>   *Sanjiv Kumar*
>>
>
>
>
> --
>  ..
>   Thanks & Regards
>   *Sanjiv Kumar*
>


Re: Last Column showing blank in csv file

2016-11-28 Thread Khurram Faraaz
Please share (attach) your CSV data files here.
Also please share your exact queries and the actual output that Drill
returns to you. That way someone here can take a look.

On Mon, Nov 28, 2016 at 6:30 PM, Sanjiv Kumar  wrote:

>  Ya Its working but what if i am using :-
>   select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`, A.class
> from
> dfs.tmp.`copydata.csv` as A;
>
> why my last column data showing blank?
> And One more thing See my csv file:-
>
> sepalen,sepalwidth,patelen,patelwidth,class
> 5.1,3.5,1.4,Iris-setosa,0.2
> 4.9,3,1.4,Iris-setosa,0.2
> 4.7,3.2,1.3,Iris-setosa,0.2
> 4.6,3.1,1.5,Iris-setosa,0.2
> 5,3.6,1.4,Iris-setosa,0.2
> 5.4,3.9,1.7,Iris-setosa,0.4
> 4.6,3.4,1.4,Iris-setosa,0.3
> 5,3.4,1.5,Iris-setosa,0.2
> 4.4,2.9,1.4,Iris-setosa,0.2
> 4.9,3.1,1.5,Iris-setosa,0.1
> 5.4,3.7,1.5,Iris-setosa,0.2
> 4.8,3.4,1.6,Iris-setosa,0.2
>
> This is my previous file.
> Now if i add comma after class Check this new file.
>
>  sepalen,sepalwidth,patelen,patelwidth,class,
> 5.1,3.5,1.4,Iris-setosa,0.2
> 4.9,3,1.4,Iris-setosa,0.2
> 4.7,3.2,1.3,Iris-setosa,0.2
> 4.6,3.1,1.5,Iris-setosa,0.2
> 5,3.6,1.4,Iris-setosa,0.2
> 5.4,3.9,1.7,Iris-setosa,0.4
> 4.6,3.4,1.4,Iris-setosa,0.3
> 5,3.4,1.5,Iris-setosa,0.2
> 4.4,2.9,1.4,Iris-setosa,0.2
> 4.9,3.1,1.5,Iris-setosa,0.1
> 5.4,3.7,1.5,Iris-setosa,0.2
> 4.8,3.4,1.6,Iris-setosa,0.2
>
>
> And Fire this query:- select A.`sepalen`, A.`sepalwidth`, A.`patelen`,
> A.`patelwidth`, A.class from dfs.tmp.`copydata.csv` as A;
>
> then the output is show fine. but if the comma is not there in then the
> last column data  showing blank.
>
> Is this a bug ??
>  ..
>   Thanks & Regards
>   *Sanjiv Kumar*
>


Re: Last Column showing blank in csv file

2016-11-28 Thread Khurram Faraaz
You should set skipFirstLine to true, in the formats area of the storage
plugin configuration.
Doing the above will ensure that Drill skips the first line of your CSV
file.
And your query should look like,

0: jdbc:drill:schema=dfs.tmp> select columns[0] sepalen, columns[1]
sepalwidth, columns[2] patelen, columns[3] patelwidth, columns[4] class
. . . . . . . . . . . . . . > from dfs.tmp.`userdata.csv`;
+--+-+--+--++
| sepalen  | sepalwidth  | patelen  |  patelwidth  | class  |
+--+-+--+--++
| 5.1  | 3.5 | 1.4  | Iris-setosa  | 0.2|
| 4.9  | 3   | 1.4  | Iris-setosa  | 0.2|
| 4.7  | 3.2 | 1.3  | Iris-setosa  | 0.2|
| 4.6  | 3.1 | 1.5  | Iris-setosa  | 0.2|
| 5| 3.6 | 1.4  | Iris-setosa  | 0.2|
| 5.4  | 3.9 | 1.7  | Iris-setosa  | 0.4|
| 4.6  | 3.4 | 1.4  | Iris-setosa  | 0.3|
| 5| 3.4 | 1.5  | Iris-setosa  | 0.2|
| 4.4  | 2.9 | 1.4  | Iris-setosa  | 0.2|
| 4.9  | 3.1 | 1.5  | Iris-setosa  | 0.1|
| 5.4  | 3.7 | 1.5  | Iris-setosa  | 0.2|
| 4.8  | 3.4 | 1.6  | Iris-setosa  | 0.2|
+--+-+--+--++
12 rows selected (0.5 seconds)

On Mon, Nov 28, 2016 at 3:59 PM, Sanjiv Kumar  wrote:

> My Hello
>   I have one csv file having data:-
>
> sepalen,sepalwidth,patelen,patelwidth,class
> 5.1,3.5,1.4,Iris-setosa,0.2
> 4.9,3,1.4,Iris-setosa,0.2
> 4.7,3.2,1.3,Iris-setosa,0.2
> 4.6,3.1,1.5,Iris-setosa,0.2
> 5,3.6,1.4,Iris-setosa,0.2
> 5.4,3.9,1.7,Iris-setosa,0.4
> 4.6,3.4,1.4,Iris-setosa,0.3
> 5,3.4,1.5,Iris-setosa,0.2
> 4.4,2.9,1.4,Iris-setosa,0.2
> 4.9,3.1,1.5,Iris-setosa,0.1
> 5.4,3.7,1.5,Iris-setosa,0.2
> 4.8,3.4,1.6,Iris-setosa,0.2
>
> While Query through this file using this query:-
>
> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`, A.class
> from dfs.tmp.`copydata.csv` as A;
>
> Last Columm Data showing null. What is the problem. Is this a bug.??
>
> My Output is:-  class column showing blank.
>
>
>
>
>
>
> --
>  ..
>   Thanks & Regards
>   *Sanjiv Kumar*
>


Re: Drill Join query optimization

2016-11-04 Thread Khurram Faraaz
What version of Drill are you on ?
Can you please share the query plan for your query ?
Please share the table definitions ?
Is this a standalone setup or cluster of Drillbits ?


On Fri, Nov 4, 2016 at 2:23 PM,  wrote:

> Hello,
>
> I am working on a proof of concept to optimize the performance of the
> join queries executed through drill. The underlying storage is a NO-SQL
> based database - Mongo DB. The time it takes to return the result of the
> join query is very high (46 seconds). Upon further analysis, as per the
> physical plan of the query it is observed that both the left side (15 lakh
> records) and right side table (13 lakh) are fully scanned and it takes 24
> seconds and 20 seconds respectively for the same.
>
> Here is the query -
>
> select ta.[SOME_COLUMN] from mongo.Test.TABLEA ta INNER JOIN
> mongo.Test.TABLEB ta ON ta.Id = tb.Id and ta.Id ='123'
>
> Records in table A : 15 Lakhs
>
> Records in table B : 13 Lakhs
>
> Filter condition : Id is indexed field in both the tables (ascending)
>
> Drill Plan shows Hash-join being performed as shown in the plan (as
> attached above)
>
> I have the following questions -
>
> 1) Why does drill fetch all the records into the memory even though
> filter condition on the indexed column was provided for one of the table ?
>
> 2) If the Drill planner/optimizer is intelligent then records could
> have been filtered on the second table as well based on the join condition
> (to reduce the dataset and hence result in faster execution time)
>
> 3) Are there any performance tuning options available to bring down
> the query response time and generate a better query plan ?
>
>
>
> I would appreciate if you can provide me pointers/directions or answers to
> the above questions.
>
> Please feel free if you need any additional information.
>
> Thanks,
>
> Mridul
>
>
>
>
>
>
>
>
> ***
>
> The Royal Bank of Scotland plc. Registered in Scotland No 90312.
> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
> Authorised by the Prudential Regulation Authority and regulated
> by the Financial Conduct Authority and Prudential Regulation Authority.
> The Royal Bank of Scotland N.V. is authorised and regulated by the
> De Nederlandsche Bank and has its seat at Amsterdam, the
> Netherlands, and is registered in the Commercial Register under
> number 33002587. Registered Office: Gustav Mahlerlaan 350,
> Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and
> The Royal Bank of Scotland plc are authorised to act as agent for each
> other in certain jurisdictions.
>
> This e-mail message is confidential and for use by the addressee only.
> If the message is received by anyone other than the addressee, please
> return the message to the sender by replying to it and then delete the
> message from your computer. Internet e-mails are not necessarily
> secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland
> N.V. including its affiliates ("RBS group") does not accept responsibility
> for changes made to this message after it was sent. For the protection
> of RBS group and its clients and customers, and in compliance with
> regulatory requirements, the contents of both incoming and outgoing
> e-mail communications, which could include proprietary information and
> Non-Public Personal Information, may be read by authorised persons
> within RBS group other than the intended recipient(s).
>
> Whilst all reasonable care has been taken to avoid the transmission of
> viruses, it is the responsibility of the recipient to ensure that the
> onward
> transmission, opening or use of this message and any attachments will
> not adversely affect its systems or data. No responsibility is accepted
> by the RBS group in this regard and the recipient should carry out such
> virus and other checks as it considers appropriate.
>
> Visit our website at www.rbs.com
> ***
>
>


Re: Drill Over MongoDB | Unable to access filelds from complex Json Object

2016-10-27 Thread Khurram Faraaz
Hello Vikram.

I tried using your data on Drill 1.9.0

[test@test-01 ~]# cat mongo_test.json
{
"_id" : " some_value ",
"_updated" : " some_value ",
"brands" : [
" some_value "
],
"_created" : " some_value ",
"_status" : " some_value ",
"metadata" : {
"username" : [
{
"site_id" : " some_value "
}
],
"status" : [1,3,4,5,6,7]
  }
}

/* To extract username value */
/* Note that the field metadata is enclosed in back ticks in the SQL */
0: jdbc:drill:schema=dfs.tmp> select sub_query.t.username from ( select
`metadata` t from `mongo_test.json`) sub_query;
+---+
|EXPR$0 |
+---+
| [{"site_id":" some_value "}]  |
+---+
1 row selected (0.223 seconds)

/* To extract site_id value */
0: jdbc:drill:schema=dfs.tmp> select sub_query.t.username.site_id from (
select `metadata` t from `mongo_test.json`) sub_query;
+---+
|EXPR$0 |
+---+
|  some_value   |
+---+
1 row selected (0.216 seconds)

Thanks,
Khurram

On Thu, Oct 27, 2016 at 11:53 AM, Vikram H. Taori 
wrote:

> Dears,
>
>
>
> Any reply/help over this below issue ??
>
>
>
> 
> --
>
> *Vikram Taori* | *InfoCepts *| www.infocepts.com
>
> Off: +91 712 224 5867 Ext 8338, +1-301-769-6212 Ext 8338, Mob: +91 837 443
> 0420
>
> 
> --
>
>
>
> *From:* Vikram H. Taori
> *Sent:* 25 October 2016 17:16
> *To:* 'user@drill.apache.org'
> *Subject:* Drill Over MongoDB | Unable to access filelds from complex
> Json Object
>
>
>
> Dears,
>
>
>
> We are using Drill 1.8.0 over MongoDB 3.2.10.
>
>
>
> Here we are accessing json data  (using JDBC connection via Squirrel).
>
>
>
> *For Simple json dataset:*
>
>
>
>
>
> Accessing using Squirrel:
>
>
>
>
>
> *For Complex json data:*
>
>
>
>
>
> {
>
> "_id" : " *some_value* ",
>
> "_updated" : " *some_value* ",
>
> "brands" : [
>
> " *some_value* "
>
> ],
>
> "_created" : " *some_value *",
>
> "_status" : " *some_value* ",
>
> "metadata" : {
>
> "username" : [
>
> {
>
> "site_id" : "* some_value* "
>
> }
>
> ],
>
> "status" : [
>
>
>
>
>
> Query like “select * from mongo.test.`member`;”
>
> gives *error “Error: SYSTEM ERROR: IllegalArgumentException: You tried to
> write a Float8 type when you are using a ValueWriter of type
> NullableVarCharWriterImpl.”*
>
>
>
> For selected *first level* of columns, it *runs successfully*
>
>
>
> *Our Query:*
>
>
>
> *How do we access fields from complex json object in drill??*
>
>
>
> Accessing “site_id” under “username”:
>
>
>
> I did google and found this approach
>
>
>
> Tried: “ select _id, _updated, brands[0], _created, _*status,
> metadata.username[0].site*_id from mongo.test.`member` ”
>
>
>
> Gives error: “Error: PARSE ERROR: Encountered ", metadata" at line 1,
> column 51”
> This e-mail and any attachments are confidential and intended solely for
> the addressee and may also be privileged or exempt from disclosure under
> applicable law. If you are not the addressee, or have received this e-mail
> in error, please notify the sender immediately, delete it from your system
> and do not copy, disclose or otherwise act upon any part of this e-mail or
> its attachments. Internet e-mails are not necessarily secure and are
> susceptible to change. InfoCepts does not accept responsibility for changes
> made to this message after it was sent. Whilst all reasonable care has been
> taken to avoid the transmission of viruses, it is the responsibility of the
> recipient to ensure that the onward transmission, opening or use of this
> message and any attachments will not adversely affect its systems or data.
> No responsibility is accepted by InfoCepts in this regard and the recipient
> should carry out such virus and other checks as it considers appropriate.
> InfoCepts does not also accept responsibility for the improper or
> incomplete transmission of the information contained in this communication
> nor for any delay in its receipt or damage to your system and does not
> guarantee that the integrity of this communication has been maintained.
> Replies to this e-mail may be monitored by InfoCepts for operational or
> business reasons.
>


Re: [Drill 1.6] : Number format exception due to Empty String

2016-10-15 Thread Khurram Faraaz
Anup, can you please

-  share your parquet data file (also your CTAS statement that you used to
create that parquet file)
-  and the SQL query you are running

That way we can run your query against your data file on latest Drill
1.9.0, and share results with you.



On Sat, Oct 15, 2016 at 4:36 PM, Nitin Pawar 
wrote:

> not sure if this helps
> try setting this ..i am nit aware which version it was included
> alter session set `drill.exec.functions.cast_empty_string_to_null=true;
>
> and then you can try cast operation as well
>
> On Oct 15, 2016 4:28 PM, "Anup Tiwari"  wrote:
>
> > We will surely test this on 1.8 and let you know.. but by looking at
> JIRA ,
> > I thought it got fixed in 0.8, is it not the case?
> > Let me know if i missed anything and pls provide some quick fix which can
> > be applicable just by changing some variable like alter session mentioned
> > in trail mail..
> >
> > On 15-Oct-2016 11:35 AM, "Nitin Pawar"  wrote:
> >
> > is there an option where you can upgrade to 1.8 and test it?
> >
> >
> > On Sat, Oct 15, 2016 at 10:23 AM, Anup Tiwari  >
> > wrote:
> >
> > > No.. on a parquet table..
> > >
> > > Regards,
> > > *Anup Tiwari*
> > >
> > > On Fri, Oct 14, 2016 at 6:23 PM, Nitin Pawar 
> > > wrote:
> > >
> > > > are you querying on csv files?
> > > >
> > > > On Fri, Oct 14, 2016 at 1:31 PM, Anup Tiwari <
> > anup.tiw...@games24x7.com>
> > > > wrote:
> > > >
> > > > > Hi Team,
> > > > >
> > > > > I got number format exception for empty string and after googling i
> > > found
> > > > > that this issue is bit related to DRILL-1874
> > > > >  .
> > > > >
> > > > > Q) Is it fixed in 1.6 ? Since it is very basic problem which can
> > occur
> > > > > anytime.
> > > > >
> > > > > After reading comments of drill-1874 , i used *alter session set
> > > > > planner.safe_cast_varchar = true;* but it is giving me below error,
> > > > Please
> > > > > let me know if i am missing anything :
> > > > >
> > > > >
> > > > > 0: jdbc:drill:> *alter session set planner.safe_cast_varchar =
> true*;
> > > > > Error: VALIDATION ERROR: The option 'planner.safe_cast_varchar'
> does
> > > not
> > > > > exist.
> > > > >
> > > > >
> > > > > [Error Id: b9a8bcc4-91e3-46e9-8cf8-55aada95383b on
> datanode3:31010]
> > > > > (state=,code=0)
> > > > > java.sql.SQLException: VALIDATION ERROR: The option
> > > > > 'planner.safe_cast_varchar' does not exist.
> > > > >
> > > > >
> > > > > [Error Id: b9a8bcc4-91e3-46e9-8cf8-55aada95383b on
> datanode3:31010]
> > > > >
> > > > >
> > > > > at
> > > > > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(
> > > > > DrillCursor.java:247)
> > > > > at
> > > > > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(
> > > > > DrillCursor.java:290)
> > > > > at
> > > > > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> > > > > DrillResultSetImpl.java:1923)
> > > > > at
> > > > > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> > > > > DrillResultSetImpl.java:73)
> > > > > at
> > > > > net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(
> > > > > AvaticaConnection.java:404)
> > > > > at
> > > > > net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(
> > > > > AvaticaStatement.java:355)
> > > > > at
> > > > > net.hydromatic.avatica.AvaticaStatement.executeInternal(
> > > > > AvaticaStatement.java:338)
> > > > > at
> > > > > net.hydromatic.avatica.AvaticaStatement.execute(
> > > > AvaticaStatement.java:69)
> > > > > at
> > > > > org.apache.drill.jdbc.impl.DrillStatementImpl.execute(
> > > > > DrillStatementImpl.java:101)
> > > > > at sqlline.Commands.execute(Commands.java:841)
> > > > > at sqlline.Commands.sql(Commands.java:751)
> > > > > at sqlline.SqlLine.dispatch(SqlLine.java:746)
> > > > > at sqlline.SqlLine.begin(SqlLine.java:621)
> > > > > at sqlline.SqlLine.start(SqlLine.java:375)
> > > > > at sqlline.SqlLine.main(SqlLine.java:268)
> > > > > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> > > > > VALIDATION ERROR: The option 'planner.safe_cast_varchar' does not
> > > exist.
> > > > >
> > > > >
> > > > > [Error Id: b9a8bcc4-91e3-46e9-8cf8-55aada95383b on
> datanode3:31010]
> > > > >
> > > > >
> > > > > at
> > > > > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(
> > > > > QueryResultHandler.java:119)
> > > > > at
> > > > > org.apache.drill.exec.rpc.user.UserClient.handleReponse(
> > > > > UserClient.java:113)
> > > > > at
> > > > > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(
> > > > > BasicClientWithConnection.java:46)
> > > > > at
> > > > > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(
> > > > > BasicClientWithConnection.java:31)
> > > > > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
> > > > >   

Re: Drill connecting to Azure DocumentDB store

2016-10-13 Thread Khurram Faraaz
If Azure DocumentDB supports JDBC / ODBC driver, then yes we can query
using SQL from Drill. I see that Azure Document DB does support SQL.

On Thu, Oct 13, 2016 at 5:29 AM, Sameer Nori  wrote:

> Can Drill connect to the Azure Document DB store?
>
> Thanks,
>
>
> --
>
> Sameer Nori
>
> Sr Product Marketing Manager
>
> +1.7037250917
>
> @sameernori
>
>  Now Available - Free Hadoop On-Demand Training
> 
>
> [image:
> http://www.mapr.com/sites/default/files/logos/mapr-logo-signature.png]
>
>
>


Re: Error parsing JSON

2016-10-12 Thread Khurram Faraaz
Since your JSON file has different types of data (line int, boolean,
strings etc) we need to tell Drill to consider all data in that JSON file
as text. This is by design, and this is not trial and error.
Such occurrence of different types of data in a JSON file is called
SchemaChange and to avoid that we set `store.json.all_text_mode` = true

On Wed, Oct 12, 2016 at 9:31 PM, Dan Blondowski <
dan.blondow...@dhigroupinc.com> wrote:

> Hello.  You were correct.
> I was able to set `store.json.all_text_mode` = true; and run the query.
>
> I also found the data that needed changing.  There were actually 3 fields.
>
> Is there a way for Drill to display which field is bad, so I don¹t have to
> do the trial & error?
>
>
>
>
> On 10/12/16, 10:12 AM, "Abhishek Girish" 
> wrote:
>
> >Hey Dan,
> >
> >This usually happens when there is a schema change across records. It
> >could
> >be intentional (records do need different types for some fields) or bad
> >formatting (1 vs "1").
> >
> >Can you try setting this session option and retry your query? Let us know
> >if it helps. You could attempt to use explicit casts in the query to get
> >types you want.
> >
> >  set `store.json.all_text_mode` = true;
> >
> >Regards,
> >Abhishek
> >
> >On Wednesday, October 12, 2016, Dan Blondowski <
> >dan.blondow...@dhigroupinc.com> wrote:
> >
> >> Hello.
> >>
> >> I have a json file with 2 records (see attached).
> >>
> >> When I run a query against it I get this error:
> >>
> >> *Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
> >>BigInt
> >> type when you are using a ValueWriter of type
> >>NullableVarCharWriterImpl.*
> >>
> >>
> >> *File  /test.json*
> >>
> >> *Record  2*
> >>
> >> *Fragment 0:0*
> >>
> >>
> >> *[Error Id: 8b5166cc-28b6-488c-893f-f0265d483e13 on
> >>ip-10-3-48-183:31010]
> >> (state=,code=0)*
> >>
> >> If I break it up into individual files, I can run the query against both
> >> records just fine. Any clues?
> >>
> >> Thanks!
> >>
> >> Daniel Blondowski
> >>
> >> Big Data Architecture/Engineering
> >> DHI Group, Inc.
> >>
> >> dan.blondow...@dice.com
> >> 
> >> 515-313-2137
> >>
> >>
> >> --
> >> This email has been scanned for email related threats by Mimecast
> >> 
> >> --
> >>
> 
> ---
>  This email has been scanned for email related threats and delivered
> safely by Mimecast.
>  For more information please visit http://www.mimecast.com
> 
> ---
>


Re: Error parsing JSON

2016-10-12 Thread Khurram Faraaz
Can you please try to set this config parameter from sqlline and then run
your query.

alter session set `store.json.all_text_mode` = true;



On Wed, Oct 12, 2016 at 8:38 PM, Dan Blondowski <
dan.blondow...@dhigroupinc.com> wrote:

> Sorry!
> Select * from s3.`test.json`;
>
> Daniel Blondowski
>
> Big Data Architecture/Engineering
> DHI Group, Inc.
>
> dan.blondow...@dice.com
> 515-313-2137
>
>
>
>
>
> On 10/12/16, 9:52 AM, "Khurram Faraaz" <kfar...@maprtech.com> wrote:
>
> >Can you please share your SQL query ?
> >
> >On Wed, Oct 12, 2016 at 7:38 PM, Dan Blondowski <
> >dan.blondow...@dhigroupinc.com> wrote:
> >
> >> Hello.
> >>
> >> I have a json file with 2 records (see attached).
> >>
> >> When I run a query against it I get this error:
> >>
> >> *Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
> >>BigInt
> >> type when you are using a ValueWriter of type
> >>NullableVarCharWriterImpl.*
> >>
> >>
> >> *File  /test.json*
> >>
> >> *Record  2*
> >>
> >> *Fragment 0:0*
> >>
> >>
> >> *[Error Id: 8b5166cc-28b6-488c-893f-f0265d483e13 on
> >>ip-10-3-48-183:31010]
> >> (state=,code=0)*
> >>
> >> If I break it up into individual files, I can run the query against both
> >> records just fine. Any clues?
> >>
> >> Thanks!
> >>
> >> Daniel Blondowski
> >>
> >> Big Data Architecture/Engineering
> >> DHI Group, Inc.
> >>
> >> dan.blondow...@dice.com
> >> 515-313-2137
> >>
> >>
> >> --
> >> This email has been scanned for email related threats by Mimecast
> >> <http://www.mimecast.com>
> >> --
> >>
> 
> ---
>  This email has been scanned for email related threats and delivered
> safely by Mimecast.
>  For more information please visit http://www.mimecast.com
> 
> ---
>


Re: Error parsing JSON

2016-10-12 Thread Khurram Faraaz
Can you please share your SQL query ?

On Wed, Oct 12, 2016 at 7:38 PM, Dan Blondowski <
dan.blondow...@dhigroupinc.com> wrote:

> Hello.
>
> I have a json file with 2 records (see attached).
>
> When I run a query against it I get this error:
>
> *Error: DATA_READ ERROR: Error parsing JSON - You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.*
>
>
> *File  /test.json*
>
> *Record  2*
>
> *Fragment 0:0*
>
>
> *[Error Id: 8b5166cc-28b6-488c-893f-f0265d483e13 on ip-10-3-48-183:31010]
> (state=,code=0)*
>
> If I break it up into individual files, I can run the query against both
> records just fine. Any clues?
>
> Thanks!
>
> Daniel Blondowski
>
> Big Data Architecture/Engineering
> DHI Group, Inc.
>
> dan.blondow...@dice.com
> 515-313-2137
>
>
> --
> This email has been scanned for email related threats by Mimecast
> 
> --
>


Re: Cannot cancel query - stuck in CANCELLATION_REQUESTED

2016-09-26 Thread Khurram Faraaz
Can you please get the jstack for the hung Drillbit process ?
And what version of Drill are you running ?

I found this jira DRILL-4355
 maybe related, not sure
though.

Thanks,
Khurram

On Mon, Sep 26, 2016 at 2:11 PM, Dan Markhasin  wrote:

> I am trying to use Drill with SQL Server using Microsoft's JDBC driver.
>
> One of the issues I've ran into is that once submitted, a query cannot be
> cancelled - sqlline stops responding completely (running on Windows in
> Embedded mode, not responding to neither Ctrl+C nor Ctrl+D, Ctrl+X, Ctrl+Z,
> etc.) and when trying to cancel the query from the UI it just changes to
> CANCELLATION_REQUESTED and stays in this state indefinitely.
> The only way to get out of this state is to kill sqlline...
>
> There is nothing in the log except for:
>
> 2016-09-18 20:53:07,521 [28212b88-282a-8bde-a2a7-93a6d9afc317:frag:0:0]
> INFO  o.a.d.e.w.fragment.FragmentExecutor -
> 28212b88-282a-8bde-a2a7-93a6d9afc317:0:0:
> State change requested AWAITING_ALLOCATION --> RUNNING
> 2016-09-18 20:53:07,529 [28212b88-282a-8bde-a2a7-93a6d9afc317:frag:0:0]
> INFO  o.a.d.e.w.f.FragmentStatusReporter -
> 28212b88-282a-8bde-a2a7-93a6d9afc317:0:0:
> State to report: RUNNING
> 2016-09-18 20:53:50,274 [CONTROL-rpc-event-queue] INFO
> o.a.d.e.w.fragment.FragmentExecutor
> - 28212b88-282a-8bde-a2a7-93a6d9afc317:0:0: State change requested RUNNING
> --> CANCELLATION_REQUESTED
> 2016-09-18 20:53:50,276 [CONTROL-rpc-event-queue] INFO
> o.a.d.e.w.f.FragmentStatusReporter
> - 28212b88-282a-8bde-a2a7-93a6d9afc317:0:0: State to report:
> CANCELLATION_REQUESTED
>
> Is this a known issue with Drill & JDBC ?
>


Re: NULL values for DATE type columns using JDBC connector

2016-09-15 Thread Khurram Faraaz
This is related to DRILL-4842

On Wed, Sep 14, 2016 at 12:31 AM, Sudheesh Katkam 
wrote:

> Hi Dan,
>
> Per documentation , I
> don’t think sql4es is in the list of actively tested JDBC drivers. But a
> “WHERE something IS NOT NULL” returning NULL looks like a bug to me. Can
> you open a ticket ?
>
> Thank you,
> Sudheesh
>
> > On Sep 12, 2016, at 8:04 AM, Dan Markhasin  wrote:
> >
> > Hi all,
> >
> > I'm using Drill to query ElasticSearch using the sql4es driver (
> > https://github.com/Anchormen/sql4es) and I've run into an issue where
> Drill
> > returns NULL values for Date columns:
> >
> > 0: jdbc:drill:zk=local> select Date_01 from
> > ES23.`data-generator-poc-async`.arm where Date_01 IS NOT NULL limit 1;
> > +--+
> > | Date_01  |
> > +--+
> > | null |
> > +--+
> >
> > The DESCRIBE command returns the correct data type (DATE).
> >
> > Using other JDBC clients (Squirrel / WorkbenchJ) I am able to run the
> exact
> > same query on the exact source, and get the correct data (which is in the
> > form of 2016-03-21, for example).
> >
> > Any idea why it's returning null?
>
>


Re: Date Formatting Question

2016-09-08 Thread Khurram Faraaz
Here is the link to currently supported datetime functions in Drill

https://drill.apache.org/docs/date-time-functions-and-arithmetic/#extract

On Fri, Sep 9, 2016 at 12:32 AM, Charles Givre  wrote:

> Hello everyone,
> I have a question about formatting dates.  Let's say that I have some data
> which has dates in format of -mm-dd but I would like to convert them to
> mm/dd/ format (or whatever).  Does Drill have something that is roughly
> equivalent to MySQL's DATE_FORMAT( , ) function
> whereby you can change the formatting of a date to whatever you want?
> Thanks,
> -- Charles
>


Re: Query hangs on planning

2016-08-31 Thread Khurram Faraaz
Can you please share the number of cores on the setup where the query hung
as compared to the number of cores on the setup where the query went
through successfully.
And details of memory from the two scenarios.

Thanks,
Khurram

On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacep...@gmail.com> wrote:

> For the record, I think this was just bad memory configuration after all.
> I retested on bigger machines and everything seems to be working fine.
>
>
> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>
>> Oscar, can you please report a JIRA with the required steps to reproduce
>> the OOM error. That way someone from the Drill team will take a look and
>> investigate.
>>
>> For others interested here is the stack trace.
>>
>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-378aaa4ce50e:foreman]
>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>> Occurred,
>> exiting. Information message: Unable to handle out of memory condition in
>> Foreman.
>> java.lang.OutOfMemoryError: Java heap space
>>at java.util.Arrays.copyOfRange(Arrays.java:2694) ~[na:1.7.0_111]
>>at java.lang.String.(String.java:203) ~[na:1.7.0_111]
>>at java.lang.StringBuilder.toString(StringBuilder.java:405)
>> ~[na:1.7.0_111]
>>at org.apache.calcite.util.Util.newInternal(Util.java:785)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>at
>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> VolcanoRuleCall.java:251)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>at
>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> VolcanoPlanner.java:808)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>at
>> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .transform(DefaultSqlHandler.java:404)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .transform(DefaultSqlHandler.java:343)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .convertToDrel(DefaultSqlHandler.java:240)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .convertToDrel(DefaultSqlHandler.java:290)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>> tPlan(ExplainHandler.java:61)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>> llSqlWorker.java:94)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:
>> 257)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> [na:1.7.0_111]
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> [na:1.7.0_111]
>>at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>>
>> Thanks,
>> Khurram
>>
>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacep...@gmail.com>
>> wrote:
>>
>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
>>> explain succeeds within ~30s.  Enabling any of the other lines triggers
>>> the
>>> failure.
>>>
>>> This is a log with the `upload_date` lines and `usage <> 'Test'` enabled:
>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>>>
>>> The client times out around here (~1.5hours):
>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>> b3c55e#file-drillbit-log-L178
>>>
>>> And it still keeps running for a while until it dies (~2.5hours):
>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>> b3c55e#file-drillbit-log-L178
>>>
>>> The memory settings for this test were:
>>&g

Re: Question related to handling of nulls in csv

2016-08-24 Thread Khurram Faraaz
There are two things you need to take care of in your example,

1. you should tell Drill skip the header from your CSV file. (add
skipFirstLine into your storage plugin)
2. you are doing a sum over string data (varchar), so you will need to cast
your data in column "c" to integer before you do the aggregate like this,
select sum(cast(c as integer)) from dfs.tmp.`test_drill.csv`


On Wed, Aug 24, 2016 at 7:46 PM, Sandeep Dugar  wrote:

> Hi,
>
> I am trying to understand the behavior while handling nulls.
>
> I have a csv file where the last column is integer values and has nulls for
> few rows. When I run an aggregation function like sum on that column I am
> getting UNSUPPORTED_OPERATION ERROR.
>
> Any pointers or links would be great. Thanks in advance!
>
> Here is the sample:
>
> 0: jdbc:drill:zk=local> select * from dfs.tmp.`test_drill.csv`;
> 'a','b','c'
> 'what','where','5'
> 'this','that','10'
> 'here','now',''
> 'go','there',''
> 'today','tomorrow','6'
> 5 rows selected (0.451 seconds)
>
> 0: jdbc:drill:zk=local> select sum(c) from dfs.tmp.`test_drill.csv`;
> Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate
> functions supported for VarChar type
>


Re: Fetch queries status from drill prompt

2016-08-19 Thread Khurram Faraaz
There is an open JIRA DRILL-4258

Once that is fixed/implemented you can query from sys.queries or
sys.statements to know about all SQL statements that are in different
states.

Khurram

On Fri, Aug 19, 2016 at 4:45 PM, Anup Tiwari 
wrote:

> Thanks for above info.. i found running queries information in Zookeeper
> logs but i guess it is in binary or some other form.
> Is their any way to read it in drill?
>
> Regards,
> *Anup Tiwari*
> Software Engineer(BI-Team),PlayGames24x7 Pvt Ltd
>
> On Thu, Aug 18, 2016 at 11:59 PM, Sudheesh Katkam 
> wrote:
>
> > Profiles of running queries are stored in Zookeeper (or the configured
> > transient store).
> >
> > Thank you,
> > Sudheesh
> >
> > > On Aug 18, 2016, at 11:23 AM, Anup Tiwari 
> > wrote:
> > >
> > > Thanks chun for info..
> > >
> > > But can you tell me from where, running queries status come on profile
> > > user-interface(UI)? Because if it's coming on profile UI then it must
> > have
> > > some back end file or something like that..
> > > On 18-Aug-2016 11:37 PM, "Chun Chang"  wrote:
> > >
> > > Anup,
> > >
> > > I believe only when a query is in a "terminal", i.e.
> > > cancelled/completed/failed state, then it is written to the
> > > drillbit_queries.json file on the foreman node. If what you want to do
> is
> > > monitoring queries running on your cluster, your best bet is to
> configure
> > > your cluster to store profile information on HDFS and monitor through
> > query
> > > profile. Remember if you have a cluster, you will have a
> > > drillbit_queries.json file on very cluster node where drillbit is
> > running.
> > > And each file only contains completed queries that were run on that
> node
> > as
> > > foreman. You would have to aggregate to get the whole picture of your
> > > cluster. Even that, you will not see running queries.
> > >
> > > Hope this helps.
> > >
> > > On Thu, Aug 18, 2016 at 12:34 AM, Anup Tiwari <
> anup.tiw...@games24x7.com
> > >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> We want to see all types of queries which ran on drill cluster or
> > > currently
> > >> running from drill prompt, Can someone help us on this?
> > >>
> > >> To achieve above , we read the drill documentation and set up a
> storage
> > >> plugin to access local file system and able to query
> > >> *"drillbit_queries.json"* log file, but in above file we are getting
> > > status
> > >> of all queries whose status is either "cancelled","completed" or
> > "failed"
> > >> but missing "running". At the same time we check drill profile
> interface
> > >> where we can see running queries.
> > >>
> > >> I am sure if we can see on User-Interface then it must be coming from
> > >> somewhere.
> > >>
> > >> Kindly help me on this.
> > >>
> > >> Regards,
> > >> *Anup Tiwari*
> > >> Software Engineer(BI-Team),PlayGames24x7 Pvt Ltd
> > >>
> >
> >
>


Re: Query hangs on planning

2016-08-09 Thread Khurram Faraaz
Oscar, can you please report a JIRA with the required steps to reproduce
the OOM error. That way someone from the Drill team will take a look and
investigate.

For others interested here is the stack trace.

2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-378aaa4ce50e:foreman]
ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred,
exiting. Information message: Unable to handle out of memory condition in
Foreman.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694) ~[na:1.7.0_111]
at java.lang.String.(String.java:203) ~[na:1.7.0_111]
at java.lang.StringBuilder.toString(StringBuilder.java:405)
~[na:1.7.0_111]
at org.apache.calcite.util.Util.newInternal(Util.java:785)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
at
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:257)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_111]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]

Thanks,
Khurram

On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacep...@gmail.com> wrote:

> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
> explain succeeds within ~30s.  Enabling any of the other lines triggers the
> failure.
>
> This is a log with the `upload_date` lines and `usage <> 'Test'` enabled:
> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>
> The client times out around here (~1.5hours):
> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> b3c55e#file-drillbit-log-L178
>
> And it still keeps running for a while until it dies (~2.5hours):
> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> b3c55e#file-drillbit-log-L178
>
> The memory settings for this test were:
>
>DRILL_HEAP="4G"
>DRILL_MAX_DIRECT_MEMORY="8G"
>
> This is on a laptop with 16G and I should probably lower it, but it seems
> a bit excessive for such a small query.  And I think I got the same results
> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to make
> sure.
>
> Thanks,
> Oscar
>
>
> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
>
>> You mentioned "*But if I uncomment the where clause then it runs for a
>> couple of hours until it runs out of memory.*"
>>
>> Can you please share the OutOfMemory details from drillbit.log and the
>> value of DRILL_MAX_DIRECT_MEMORY
>>
>> Can you also try to see what happens if you retain just this line where
>> upload_date = '2016-08-01' in your where clause, can you check if the
>> explain succeeds.
>>
>> Thanks,
>> Khurram
>>
>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <spacep...@gmail.com>
>> wrote:
>>
>> Hi there,
>>> I've been stuck with this for a while and I'm not sure if I'm running
>>> into
>>> a bug or I'm just doing something very wrong.
>>>
>>> I have this stripped-down version of my query:
>>> https://gist.g

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-05 Thread Khurram Faraaz
ot;version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 196611,
> "userName" : "hadoop",
> "files" : [
> "hdfs://namenode:9000/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2" ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "hdfs://namenode:9000",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : "/tmp/",
>   "writable" : true,
>   "defaultInputFormat" : null
> },
> "tmp" : {
>   "location" : "/tmp",
>   "writable" : true,
>   "defaultInputFormat" : null
> }
>   },
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> }
>   }
> },
> "format" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "columns" : [ "`channelid`", "`serverTime`" ],
> "selectionRoot" :
> "hdfs://namenode:9000/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2",
> "cost" : 1148224.0
>   }, {
> "pop" : "hash-aggregate",
> "@id" : 196610,
> "child" : 196611,
> "cardinality" : 1.0,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "groupByExprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> } ],
> "aggrExprs" : [ {
>   "ref" : "`EXPR$1`",
>   "expr" : "count(`serverTime`) "
> } ],
> "cost" : 574112.0
>   }, {
> "pop" : "project",
> "@id" : 196609,
> "exprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> }, {
>   "ref" : "`EXPR$1`",
>   "expr" : "`EXPR$1`"
> }, {
>   "ref" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`",
>   "expr" : "hash32asdouble(`channelid`) "
> } ],
> "child" : 196610,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "unordered-mux-exchange",
> "@id" : 131073,
> "child" : 196609,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "hash-to-random-exchange",
> "@id" : 65539,
> "child" : 131073,
> "expr" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`",
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "project",
> "@id" : 65538,
> "exprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> }, {
>   "ref" : "`EXPR$1`",
>   "expr" : "`EXPR$1`"
> } ],
> "child" : 65539,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "hash-aggregate",
> "@id" : 65537,
> "child" : 65538,
> "cardinality" : 1.0,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "groupByExprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> } ],
> "aggrExprs" : [ {
>   "ref" : "`EXPR$1`",
>   "expr" : "$sum0(`EXPR$1`) "
> } ],
> "cost" : 57411.2
>   }, {
> "pop" : "union-exchange",
> "@id" : 2,
> "child" : 65537,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 11482.24
>   }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> }, {
>   "ref" : "`EXPR$1`",
>   "expr" : "`EXPR$1`"
> } ],
> "child" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 11482.24
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 11482.24
>   } ]
> } |
> +--+--+
> 1 row selected (0.381 seconds)
> 0: jdbc:drill:>
>
>
> On Thu, Aug 4, 2016 at 3:07 PM, Khurram Faraaz <kfar...@maprtech.com>
> wrote:
>
> > Can you please do an explain plan over the two aggregate queries. That
> way
> > we can know where most of the time is being spent, is it in the query
> > planning phase or is it query execution that is taking longer. Please
> share
> > the query plans and the time taken for those explain plan statements.
> >
> > On Mon, Aug 1, 2016 at 3:46 PM, Shankar Mane <shankar.m...@games24x7.com
> >
> > wrote:
> >
> > > It is plain json (1 json per line).
> > > Each json message size = ~4kb
> > > no. of json messages = ~5 Millions.
> > >
> > > store.parquet.compression = snappy ( i don't think, this parameter get
> > > used. As I am querying select only.)
> > >
> > >
> > > On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz <kfar...@maprtech.com>
> > > wrote:
> > >
> > > > What is the data format within those .gz and .bz2 files ? It is
> parquet
> > > or
> > > > JSON or plain text (CSV) ?
> > > > Also, what was this config parameter `store.parquet.compression` set
> > to,
> > > > when ypu ran your test ?
> > > >
> > > > - Khurram
> > > >
> > > > On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane <
> > > shankar.m...@games24x7.com
> > > > >
> > > > wrote:
> > > >
> > > > > Awaiting for response..
> > > > >
> > > > > On 30-Jul-2016 3:20 PM, "Shankar Mane" <shankar.m...@games24x7.com
> >
> > > > wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > I am Comparing Querying speed between GZ and BZ2.
> > > > > >
> > > > > > Below are the 2 files and their sizes (This 2 files have same
> > data):
> > > > > > kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> > > > > > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
> > > > > >
> > > > > >
> > > > > >
> > > > > > Results:
> > > > > >
> > > > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > > > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by
> > channelid
> > > ;
> > > > > > ++--+
> > > > > > | channelid  |  EXPR$1  |
> > > > > > ++--+
> > > > > > | 3  | 977134   |
> > > > > > | 0  | 836850   |
> > > > > > | 2  | 3202854  |
> > > > > > ++--+
> > > > > > 3 rows selected (86.034 seconds)
> > > > > >
> > > > > >
> > > > > >
> > > > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > > > dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by
> > > channelid
> > > > ;
> > > > > > ++--+
> > > > > > | channelid  |  EXPR$1  |
> > > > > > ++--+
> > > > > > | 3  | 977134   |
> > > > > > | 0  | 836850   |
> > > > > > | 2  | 3202854  |
> > > > > > ++--+
> > > > > > 3 rows selected (459.079 seconds)
> > > > > >
> > > > > >
> > > > > >
> > > > > > Questions:
> > > > > > 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> > > > > > 2. How can we speed to up Bz2.  Are there any configuration to
> do ?
> > > > > > 3. As bz2 is splittable format, How drill using it ?
> > > > > >
> > > > > >
> > > > > > regards,
> > > > > > shankar
> > > > >
> > > >
> > >
> >
>


Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Khurram Faraaz
;type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 196611,
> "userName" : "hadoop",
> "files" : [
> "hdfs://namenode:9000/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2" ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "hdfs://namenode:9000",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : "/tmp/",
>   "writable" : true,
>   "defaultInputFormat" : null
> },
> "tmp" : {
>   "location" : "/tmp",
>   "writable" : true,
>   "defaultInputFormat" : null
> }
>   },
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> }
>   }
> },
> "format" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "columns" : [ "`channelid`", "`serverTime`" ],
> "selectionRoot" :
> "hdfs://namenode:9000/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2",
> "cost" : 1148224.0
>   }, {
> "pop" : "hash-aggregate",
> "@id" : 196610,
> "child" : 196611,
> "cardinality" : 1.0,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "groupByExprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> } ],
> "aggrExprs" : [ {
>   "ref" : "`EXPR$1`",
>   "expr" : "count(`serverTime`) "
> } ],
> "cost" : 574112.0
>   }, {
> "pop" : "project",
> "@id" : 196609,
> "exprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> }, {
>   "ref" : "`EXPR$1`",
>   "expr" : "`EXPR$1`"
> }, {
>   "ref" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`",
>   "expr" : "hash32asdouble(`channelid`) "
> } ],
> "child" : 196610,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "unordered-mux-exchange",
> "@id" : 131073,
> "child" : 196609,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "hash-to-random-exchange",
> "@id" : 65539,
> "child" : 131073,
> "expr" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`",
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "project",
> "@id" : 65538,
> "exprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> }, {
>   "ref" : "`EXPR$1`",
>   "expr" : "`EXPR$1`"
> } ],
> "child" : 65539,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 114822.4
>   }, {
> "pop" : "hash-aggregate",
> "@id" : 65537,
> "child" : 65538,
> "cardinality" : 1.0,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "groupByExprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> } ],
> "aggrExprs" : [ {
>   "ref" : "`EXPR$1`",
>   "expr" : "$sum0(`EXPR$1`) "
> } ],
> "cost" : 57411.2
>   }, {
> "pop" : "union-exchange",
> "@id" : 2,
> "child" : 65537,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 11482.24
>   }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
>   "ref" : "`channelid`",
>   "expr" : "`channelid`"
> }, {
>   "ref" : "`EXPR$1`",
>   "expr" : "`EXPR$1`"
> } ],
> "child" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 11482.24
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 11482.24
>   } ]
> } |
> +--+--+
> 1 row selected (0.381 seconds)
> 0: jdbc:drill:>
>
>
> On Thu, Aug 4, 2016 at 3:07 PM, Khurram Faraaz <kfar...@maprtech.com>
> wrote:
>
> > Can you please do an explain plan over the two aggregate queries. That
> way
> > we can know where most of the time is being spent, is it in the query
> > planning phase or is it query execution that is taking longer. Please
> share
> > the query plans and the time taken for those explain plan statements.
> >
> > On Mon, Aug 1, 2016 at 3:46 PM, Shankar Mane <shankar.m...@games24x7.com
> >
> > wrote:
> >
> > > It is plain json (1 json per line).
> > > Each json message size = ~4kb
> > > no. of json messages = ~5 Millions.
> > >
> > > store.parquet.compression = snappy ( i don't think, this parameter get
> > > used. As I am querying select only.)
> > >
> > >
> > > On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz <kfar...@maprtech.com>
> > > wrote:
> > >
> > > > What is the data format within those .gz and .bz2 files ? It is
> parquet
> > > or
> > > > JSON or plain text (CSV) ?
> > > > Also, what was this config parameter `store.parquet.compression` set
> > to,
> > > > when ypu ran your test ?
> > > >
> > > > - Khurram
> > > >
> > > > On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane <
> > > shankar.m...@games24x7.com
> > > > >
> > > > wrote:
> > > >
> > > > > Awaiting for response..
> > > > >
> > > > > On 30-Jul-2016 3:20 PM, "Shankar Mane" <shankar.m...@games24x7.com
> >
> > > > wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > I am Comparing Querying speed between GZ and BZ2.
> > > > > >
> > > > > > Below are the 2 files and their sizes (This 2 files have same
> > data):
> > > > > > kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> > > > > > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
> > > > > >
> > > > > >
> > > > > >
> > > > > > Results:
> > > > > >
> > > > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > > > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by
> > channelid
> > > ;
> > > > > > ++--+
> > > > > > | channelid  |  EXPR$1  |
> > > > > > ++--+
> > > > > > | 3  | 977134   |
> > > > > > | 0  | 836850   |
> > > > > > | 2  | 3202854  |
> > > > > > ++--+
> > > > > > 3 rows selected (86.034 seconds)
> > > > > >
> > > > > >
> > > > > >
> > > > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > > > dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by
> > > channelid
> > > > ;
> > > > > > ++--+
> > > > > > | channelid  |  EXPR$1  |
> > > > > > ++--+
> > > > > > | 3  | 977134   |
> > > > > > | 0  | 836850   |
> > > > > > | 2  | 3202854  |
> > > > > > ++--+
> > > > > > 3 rows selected (459.079 seconds)
> > > > > >
> > > > > >
> > > > > >
> > > > > > Questions:
> > > > > > 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> > > > > > 2. How can we speed to up Bz2.  Are there any configuration to
> do ?
> > > > > > 3. As bz2 is splittable format, How drill using it ?
> > > > > >
> > > > > >
> > > > > > regards,
> > > > > > shankar
> > > > >
> > > >
> > >
> >
>


Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Khurram Faraaz
Can you please do an explain plan over the two aggregate queries. That way
we can know where most of the time is being spent, is it in the query
planning phase or is it query execution that is taking longer. Please share
the query plans and the time taken for those explain plan statements.

On Mon, Aug 1, 2016 at 3:46 PM, Shankar Mane <shankar.m...@games24x7.com>
wrote:

> It is plain json (1 json per line).
> Each json message size = ~4kb
> no. of json messages = ~5 Millions.
>
> store.parquet.compression = snappy ( i don't think, this parameter get
> used. As I am querying select only.)
>
>
> On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz <kfar...@maprtech.com>
> wrote:
>
> > What is the data format within those .gz and .bz2 files ? It is parquet
> or
> > JSON or plain text (CSV) ?
> > Also, what was this config parameter `store.parquet.compression` set to,
> > when ypu ran your test ?
> >
> > - Khurram
> >
> > On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane <
> shankar.m...@games24x7.com
> > >
> > wrote:
> >
> > > Awaiting for response..
> > >
> > > On 30-Jul-2016 3:20 PM, "Shankar Mane" <shankar.m...@games24x7.com>
> > wrote:
> > >
> > > >
> > >
> > > > I am Comparing Querying speed between GZ and BZ2.
> > > >
> > > > Below are the 2 files and their sizes (This 2 files have same data):
> > > > kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> > > > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
> > > >
> > > >
> > > >
> > > > Results:
> > > >
> > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid
> ;
> > > > ++--+
> > > > | channelid  |  EXPR$1  |
> > > > ++--+
> > > > | 3  | 977134   |
> > > > | 0  | 836850   |
> > > > | 2  | 3202854  |
> > > > ++--+
> > > > 3 rows selected (86.034 seconds)
> > > >
> > > >
> > > >
> > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by
> channelid
> > ;
> > > > ++--+
> > > > | channelid  |  EXPR$1  |
> > > > ++--+
> > > > | 3  | 977134   |
> > > > | 0  | 836850   |
> > > > | 2  | 3202854  |
> > > > ++--+
> > > > 3 rows selected (459.079 seconds)
> > > >
> > > >
> > > >
> > > > Questions:
> > > > 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> > > > 2. How can we speed to up Bz2.  Are there any configuration to do ?
> > > > 3. As bz2 is splittable format, How drill using it ?
> > > >
> > > >
> > > > regards,
> > > > shankar
> > >
> >
>


Re: Overflow detection in Drill

2016-07-26 Thread Khurram Faraaz
Another example where we don't detect/report overflow

Results from Postgres

postgres=# SELECT col0, AVG(col0) OVER ( ORDER BY col0 + col1 ) avg_col0
FROM fewrwspqq_101 GROUP BY col0,col1;

ERROR:  bigint out of range

postgres=#

Results from Drill 1.8.0

0: jdbc:drill:schema=dfs.tmp> SELECT col0, AVG(col0) OVER ( ORDER BY col0 +
col1 ) avg_col0 FROM `allTypsUniq.parquet` GROUP BY col0,col1;
+-+---+
|col0 |   avg_col0|
+-+---+
| 23  | 23.0  |
| -1  | 11.0  |
| -65535  | -21837.6668   |
| 3   | -16377.5  |
| 4   | -13101.2  |
| 5   | -10916.8334   |
| 6   | -9356.42857142857 |
| 7   | -8186.0   |
| 8   | -7275.5556|
| 13  | -6546.7   |
| 19  | -5949.818181818182|
| 9   | -5453.25  |
| 1   | -5033.692307692308|
| 65535   | 6.928571428571429 |
| 2   | 6.6   |
| 10  | 6.8125|
| 1000| 588241.7058823529 |
| 1073741823  | 6.0207885E7   |
| 2147483647  | 1.7006450415789473E8  |
| 109 | 1.615612844E8 |
| 29  | 1.538678912857143E8   |
| 0   | 1.4687389622727272E8  |
+-+---+
22 rows selected (0.341 seconds)



On Tue, Jul 26, 2016 at 2:07 AM, Khurram Faraaz <kfar...@maprtech.com>
wrote:

> Hi All,
>
> As of today Drill does not handle overflow detection and does not report
> that was an overflow to users, instead we just return results that are
> incorrect. This issue has been discussed (but not in detail) in the past.
>
> It would be great if Drill also handled overflow detection in data of type
> (int, bigint etc) like other existing DBMSs do. Users will not want to see
> incorrect/wrong results, instead an error that informs users that there was
> an overflow will make more sense.
>
> Here is an example of one such query that returns incorrect results as
> compared to Postgres. Difference in results (related to overflow detection
> problem), col1 is of type BIGINT
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT col1, AVG(SUM(col1)) OVER ( PARTITION
> BY col7 ORDER BY col0 ) FROM `allTypsUniq.parquet` GROUP BY col0,col1,col7;
> +--+--+
> | col1 |  EXPR$1  |
> +--+--+
> | 5000 | 5000.0   |
> | 9223372036854775807  | -4.6116860184273853E18   |
> | 65534| -3.0744573456182349E18   |
> | -1   | -2.30584300921367629E18  |
> | 1| -1.84467440737094093E18  |
> | 17   | -1.53722867280911744E18  |
> | 1000 | -1.31762457669352909E18  |
> | 200  | -1.15292150460683802E18  |
> | 4611686018427387903  | -5.1240955760303514E17   |
> | 1001 | -4.6116860184273152E17   |
> | 30   | -4.1924418349339232E17   |
> | -65535   | -65535.0 |
> | 1000 | 4967232.5|
> | 0| 3311488.35   |
> | 13   | 2483619.5|
> | 23   | 1986900.2|
> | 999  | 3322416.65   |
> | 197  | 2847813.8571428573   |
> | 9223372036854775806  | -1.1529215046043552E18   |
> | 92233720385475807| -1.01457092404992947E18  |
> | 25   | -9.1311383164493645E17   |
> | 3000 | -8.3010348331357837E17   |
> +--+--+
> 22 rows selected (0.46 seconds)
> {noformat}
>
> Results from Postgres
>
> {noformat}
> postgres=# SELECT col1, AVG(SUM(col1)) OVER ( PARTITION BY col7 ORDER BY
> col0 ) FROM fewrwspqq_101 GROUP BY col0,col1,col7;
> col1 |  avg
> -+---
> 5000 | 5000.
>  9223372036854775807 |   4611686018427390404
>65534 |   3074457345618282114
>   -1 |   2305843009213711585
>1 |   1844674407370969268
>   17 |   1537228672809141060
> 1000 |   1317624576693549623
>  200 |   1152921504606855945
>  4611686018427387903 |   1537228672809137273
> 1001 |   1383505805528223646
>   30 |   1257732550480203317
>   -65535 |   -65535.
> 1000 |  4967232.5000
>0 |  3311488.333

Overflow detection in Drill

2016-07-25 Thread Khurram Faraaz
Hi All,

As of today Drill does not handle overflow detection and does not report
that was an overflow to users, instead we just return results that are
incorrect. This issue has been discussed (but not in detail) in the past.

It would be great if Drill also handled overflow detection in data of type
(int, bigint etc) like other existing DBMSs do. Users will not want to see
incorrect/wrong results, instead an error that informs users that there was
an overflow will make more sense.

Here is an example of one such query that returns incorrect results as
compared to Postgres. Difference in results (related to overflow detection
problem), col1 is of type BIGINT

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT col1, AVG(SUM(col1)) OVER ( PARTITION
BY col7 ORDER BY col0 ) FROM `allTypsUniq.parquet` GROUP BY col0,col1,col7;
+--+--+
| col1 |  EXPR$1  |
+--+--+
| 5000 | 5000.0   |
| 9223372036854775807  | -4.6116860184273853E18   |
| 65534| -3.0744573456182349E18   |
| -1   | -2.30584300921367629E18  |
| 1| -1.84467440737094093E18  |
| 17   | -1.53722867280911744E18  |
| 1000 | -1.31762457669352909E18  |
| 200  | -1.15292150460683802E18  |
| 4611686018427387903  | -5.1240955760303514E17   |
| 1001 | -4.6116860184273152E17   |
| 30   | -4.1924418349339232E17   |
| -65535   | -65535.0 |
| 1000 | 4967232.5|
| 0| 3311488.35   |
| 13   | 2483619.5|
| 23   | 1986900.2|
| 999  | 3322416.65   |
| 197  | 2847813.8571428573   |
| 9223372036854775806  | -1.1529215046043552E18   |
| 92233720385475807| -1.01457092404992947E18  |
| 25   | -9.1311383164493645E17   |
| 3000 | -8.3010348331357837E17   |
+--+--+
22 rows selected (0.46 seconds)
{noformat}

Results from Postgres

{noformat}
postgres=# SELECT col1, AVG(SUM(col1)) OVER ( PARTITION BY col7 ORDER BY
col0 ) FROM fewrwspqq_101 GROUP BY col0,col1,col7;
col1 |  avg
-+---
5000 | 5000.
 9223372036854775807 |   4611686018427390404
   65534 |   3074457345618282114
  -1 |   2305843009213711585
   1 |   1844674407370969268
  17 |   1537228672809141060
1000 |   1317624576693549623
 200 |   1152921504606855945
 4611686018427387903 |   1537228672809137273
1001 |   1383505805528223646
  30 |   1257732550480203317
  -65535 |   -65535.
1000 |  4967232.5000
   0 |  3311488.
  13 |  2483619.5000
  23 |  1986900.2000
 999 |  3322416.6667
 197 |  2847813.857142857143
 9223372036854775806 |   1152921504609338813
   92233720385475807 |   1035067306362242923
  25 |931560575726018634
3000 |846873250660017212
(22 rows)
{noformat}

Thanks,
Khurram


Re: Drill JDBC Interpreter: Connection Refused

2016-07-12 Thread Khurram Faraaz
Also try using port number 5181 in your drill-override.conf, instead of
2181, and then stop and start Drillbit and re-run your program.

On Tue, Jul 12, 2016 at 1:08 PM, Khurram Faraaz <kfar...@maprtech.com>
wrote:

> This one works just fine for me
>
> final String URL_STRING = "jdbc:drill:schema=dfs.tmp;drillbit=";
>
> replace IPADDRESS with your IP address in the above line.
>
> On Tue, Jul 12, 2016 at 12:08 PM, Krishnaprasad A S <
> krishna.pra...@flytxt.com> wrote:
>
>> Yes, I tried with url *jdbc:drill:drillbit=:31010* but same error,
>> Error in zeppelin console,
>> java.net.ConnectException: Connection refused at
>> java.net.PlainSocketImpl.socketConnect(Native Method) at
>>
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>> at
>>
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>> 
>>
>> *drill-override.conf *
>> drill.exec: {
>>   cluster-id: "drillbits1",
>>   zk.connect: "dk-slv8:2181"
>> }
>>
>> Some more details,
>> I'm querying from hdfs (select count(*) from `dfs`.`/POC/Sample1.csv`)
>> the same query works with Drill Web UI.
>> Also the errors are logged in  'zeppelin-hadoop-dk-slv8.log' and not in
>> 'zeppelin-interpreter-jdbc-hadoop-dk-slv8.log'
>>
>> I think there is something I'm missing because I tried removing the
>> drill.url and kept it blank again the same error is coming.
>> But there is no host or port mentioned for the ConnectException:
>> Connection
>> refused.
>>
>>
>>
>> On Tue, Jul 12, 2016 at 3:31 AM, Andries Engelbrecht <
>> aengelbre...@maprtech.com> wrote:
>>
>> > Have you tried to connect to a drillbit directly from Zeppelin using the
>> > jdbc url jdbc:drill:drillbit=:31010?
>> >
>> > What does your drill-override.conf file look like?
>> >
>> >
>> > > On Jul 11, 2016, at 2:33 PM, Krishnaprasad A S <
>> > krishna.pra...@flytxt.com> wrote:
>> > >
>> > > I can see all the 4 drillbits in ui. Aso I tried the query in web ui
>> > before
>> > > running it in zeppelin. In web ui it works currently. Then what may be
>> > the
>> > > issue.?
>> > > On Jul 12, 2016 3:00 AM, "Andries Engelbrecht" <
>> > aengelbre...@maprtech.com>
>> > > wrote:
>> > >
>> > > What happens if you try to connect to a drillbit directly?
>> > >
>> > > This will help to see if it is a zk connection issue.
>> > >
>> > > Also I assume the dill cluster is up and running, and if you go to the
>> > > webUI it shows all drillbits in the cluster connected and running.
>> > >
>> > >
>> > >> On Jul 11, 2016, at 2:27 PM, Krishnaprasad A S <
>> > krishna.pra...@flytxt.com>
>> > > wrote:
>> > >>
>> > >> I went through the same link before configuring the interpreter, also
>> > > there
>> > >> is no security configured.
>> > >> On Jul 12, 2016 2:48 AM, "Andries Engelbrecht" <
>> > aengelbre...@maprtech.com>
>> > >> wrote:
>> > >>
>> > >>> Do you have security configured on the Drill cluster? If so make
>> sure
>> > to
>> > >>> add the user and password info for the Drill connection.
>> > >>>
>> > >>> Some good info for configuring Zeppelin with Drill here
>> > >>>
>> > >>> https://community.mapr.com/docs/DOC-1493 <
>> > >>> https://community.mapr.com/docs/DOC-1493>
>> > >>>
>> > >>> --Andries
>> > >>>
>> > >>>
>> > >>>> On Jul 11, 2016, at 2:13 PM, Krishnaprasad A S <
>> > >>> krishna.pra...@flytxt.com> wrote:
>> > >>>>
>> > >>>> My drill runs in clustered mode, with 4 drillbits running in 4
>> nodes.
>> > I
>> > >>>> started it using drillbit.sh start command.
>> > >>>> On Jul 12, 2016 2:18 AM, "Andries Engelbrecht" <
>> > >>> aengelbre...@maprtech.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Are you running Drill in embedded mode or clustered mode?
>> > >>>>>
>> &g

Re: Drill JDBC Interpreter: Connection Refused

2016-07-12 Thread Khurram Faraaz
This one works just fine for me

final String URL_STRING = "jdbc:drill:schema=dfs.tmp;drillbit=";

replace IPADDRESS with your IP address in the above line.

On Tue, Jul 12, 2016 at 12:08 PM, Krishnaprasad A S <
krishna.pra...@flytxt.com> wrote:

> Yes, I tried with url *jdbc:drill:drillbit=:31010* but same error,
> Error in zeppelin console,
> java.net.ConnectException: Connection refused at
> java.net.PlainSocketImpl.socketConnect(Native Method) at
>
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
>
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> 
>
> *drill-override.conf *
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "dk-slv8:2181"
> }
>
> Some more details,
> I'm querying from hdfs (select count(*) from `dfs`.`/POC/Sample1.csv`)
> the same query works with Drill Web UI.
> Also the errors are logged in  'zeppelin-hadoop-dk-slv8.log' and not in
> 'zeppelin-interpreter-jdbc-hadoop-dk-slv8.log'
>
> I think there is something I'm missing because I tried removing the
> drill.url and kept it blank again the same error is coming.
> But there is no host or port mentioned for the ConnectException: Connection
> refused.
>
>
>
> On Tue, Jul 12, 2016 at 3:31 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
>
> > Have you tried to connect to a drillbit directly from Zeppelin using the
> > jdbc url jdbc:drill:drillbit=:31010?
> >
> > What does your drill-override.conf file look like?
> >
> >
> > > On Jul 11, 2016, at 2:33 PM, Krishnaprasad A S <
> > krishna.pra...@flytxt.com> wrote:
> > >
> > > I can see all the 4 drillbits in ui. Aso I tried the query in web ui
> > before
> > > running it in zeppelin. In web ui it works currently. Then what may be
> > the
> > > issue.?
> > > On Jul 12, 2016 3:00 AM, "Andries Engelbrecht" <
> > aengelbre...@maprtech.com>
> > > wrote:
> > >
> > > What happens if you try to connect to a drillbit directly?
> > >
> > > This will help to see if it is a zk connection issue.
> > >
> > > Also I assume the dill cluster is up and running, and if you go to the
> > > webUI it shows all drillbits in the cluster connected and running.
> > >
> > >
> > >> On Jul 11, 2016, at 2:27 PM, Krishnaprasad A S <
> > krishna.pra...@flytxt.com>
> > > wrote:
> > >>
> > >> I went through the same link before configuring the interpreter, also
> > > there
> > >> is no security configured.
> > >> On Jul 12, 2016 2:48 AM, "Andries Engelbrecht" <
> > aengelbre...@maprtech.com>
> > >> wrote:
> > >>
> > >>> Do you have security configured on the Drill cluster? If so make sure
> > to
> > >>> add the user and password info for the Drill connection.
> > >>>
> > >>> Some good info for configuring Zeppelin with Drill here
> > >>>
> > >>> https://community.mapr.com/docs/DOC-1493 <
> > >>> https://community.mapr.com/docs/DOC-1493>
> > >>>
> > >>> --Andries
> > >>>
> > >>>
> >  On Jul 11, 2016, at 2:13 PM, Krishnaprasad A S <
> > >>> krishna.pra...@flytxt.com> wrote:
> > 
> >  My drill runs in clustered mode, with 4 drillbits running in 4
> nodes.
> > I
> >  started it using drillbit.sh start command.
> >  On Jul 12, 2016 2:18 AM, "Andries Engelbrecht" <
> > >>> aengelbre...@maprtech.com>
> >  wrote:
> > 
> > > Are you running Drill in embedded mode or clustered mode?
> > >
> > > If in embedded mode you may want to try to connect directly to the
> > >>> drillbit
> > > jdbc:drill:drillbit=:31010
> > >
> > > It looks like you are trying to connect to a zk with a drill
> cluster,
> > >>> and
> > > your setup may just be embedded mode.
> > >
> > > --Andries
> > >
> > >
> > >> On Jul 11, 2016, at 12:55 PM, Krishnaprasad A S <
> > > krishna.pra...@flytxt.com> wrote:
> > >>
> > >> hi,
> > >> I'm trying to create a drill interpreter in zeppelin using the
> > > existing
> > >> jdbc interpreter.
> > >>
> > >> *drill.url =
> jdbc:drill:zk=:2181/drill/drillbits1drill.driver
> > =
> > >> org.apache.drill.jdbc.Driver*
> > >> my drillbit runs on the same server as of zeppelin.
> > >>
> > >> Added the dependency
> > > 'apache-drill-1.7.0/jars/drill-jdbc-all-1.7.0.jar'
> > > in
> > >> zeppelin
> > >> I'm the getting the following error while running a sample SQL
> from
> > >> zeppelin,
> > >>
> > >> ERROR [2016-07-12 01:11:41,946] ({pool-1-thread-4}
> > >> NotebookServer.java[afterStatusChange]:1135) - Error
> > >> org.apache.zeppelin.interpreter.InterpreterException:
> > >> org.apache.zeppelin.interpreter.InterpreterException:
> > >> org.apache.thrift.transport.TTransportException:
> > > java.net.ConnectException:
> > >> Connection refused
> > >> at
> > >>
> > >
> > >>>
> > >
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
> > >> at
> > >>
> > >
> > >>>
> > >
> >
> 

Re: Hbase SQL

2016-06-17 Thread Khurram Faraaz
Can you try these two queries ?, and are you trying to execute your query
from Drill's sqlline prompt or from a Java program ?

select X.CF.`11503` from hbase.`X**` where
convert_from(row_key, 'bigint_be') = 916124337090;

and this one

select X.CF.`11503` from hbase.`X**` where
convert_from(row_key, 'UTF8') = 916124337090;


On Fri, Jun 17, 2016 at 5:31 PM, Krishnaprasad A S <
krishna.pra...@flytxt.com> wrote:

> Hi,
> I have a doubt on Hbase storage in Drill.
>
> My HBase table 'X' has a column family 'CF' and a column name which is an
> integer.
> we store the data in hbase with column name as bytes array, using the
> following code to create the byte[] (as normal in Hbase).
> byte[] column = org.apache.hadoop.hbase.util.Bytes*.toBytes(integerValue)*
>
> if my column name is integer 11503,
> then how can I query the data from the specific column.
> Currently the following query does not work, it returns zero rows even
> though data exists in the column.
> *select X.CF.`11503` from hbase.`X**` where
> convert_from(byte_substr(row_key, 1, 12), 'bigint_be') = 916124337090;*
>
> Any help ?
>
> --
> Krishnaprasad A S
> <
> http://www.linkedin.com/company/22166?goback=%2Efcs_GLHD_flytxt_false_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2=ncsrch_hits
> >
>


Re: Regarding Excel Files and Ms Access File.

2016-05-23 Thread Khurram Faraaz
Sanjiv you can try to export your data from MS Excel file to a text/CSV
file, using the Save As option. and then query that CSV/text file from
Drill.

On Thu, May 19, 2016 at 12:38 AM, Antonio Romero (carnorom) <
carno...@cisco.com> wrote:

> http://ucanaccess.sourceforge.net/site.html
>
> This would let you connect to Microsoft access via generic JDBC. Solves
> one big piece of your problem. I think there is something similar for Excel.
>
> Sent from my iPhone
>
> > On May 17, 2016, at 10:57 PM, "Sanjiv Kumar" 
> wrote:
> >
> > This is my second mail regarding  excel file (.xsl file) and Ms Access
> Files.
> >
> > When  Drill provide Storage Plugin for Excel File And Ms Access File?
> >
> > And When Drill latest version 1.7 is going to be release.? Can i
> > aspect that Storage Plugin for excel file will be there in Latest
> > Version (i.e 1.7)??
> >
> >
> >
> >
> >
> >
> >
> >
> > ..
> >  Thanks & Regards
> >  *Sanjiv Kumar*
>


Re: Drill Issues

2016-05-23 Thread Khurram Faraaz
You can use CONVERT_TO and CONVERT_FROM functions, to get HDFS bytes
read/write from drill queries.

https://drill.apache.org/docs/supported-data-types/#data-types-for-convert_to-and-convert_from-functions

On Mon, May 23, 2016 at 11:28 AM, vinita.go...@siemens.com <
vinita.go...@siemens.com> wrote:

> Hi
>
> How do I get HDFS bytes read and write information for drill queries?
>
>
>
> Thanks and Regards,
> Vinita Goyal
>
>
>
>
>


Re: query from hbase issue

2016-05-19 Thread Khurram Faraaz
Qiang, can you please take a look at DRILL-4686 and confirm if the data set
used in my repro is the same as the one you have used. If the data set is
different please let us know the type of data that you have used in your
table.

Aman - I will try to repro the problem on Drill 1.6.0 and share results.

Thanks,
Khurram

On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org> wrote:

> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
> error whereas the original problem from qiang was an incorrect result.  Can
> you use the same version (1.6) that he was using.  Also, is the data set
> similar ? If you are unable to repro the exact same issue,  perhaps qiang
> should file a JIRA with a smaller repro if possible.
>
>
>
> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kfar...@maprtech.com>
> wrote:
>
> > Hello Qiang,
> >
> > DRILL-4686 is reported to track this problem.
> >
> > Thanks,
> > Khurram
> >
> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com> wrote:
> >
> >> Ok, Thanks very much.
> >>
> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> >>
> >>> Hello Qiang,
> >>>
> >>> Someone from our Drill team (in San Jose) will get back to you soon. I
> >>> work from the India lab and I am in a different time zone as compared
> to
> >>> San Jose office, some one from MapR San Jose will get back to you as
> soon
> >>> as possible.
> >>>
> >>> Thanks,
> >>> Khurram
> >>>
> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com>
> wrote:
> >>>
> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >>>> conclusion?
> >>>>
> >>>> Any idea how to sovle it?
> >>>>
> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> >>>>
> >>>>> So I tried to create the table using HBase API (with no data inserted
> >>>>> into table) and I got the query plan for drill 1.7.0
> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >>>>>
> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >>>>> +--++--+
> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >>>>> +--++--+
> >>>>> | row_key  | ANY| NO   |
> >>>>> | v| MAP| NO   |
> >>>>> +--++--+
> >>>>> 2 rows selected (1.665 seconds)
> >>>>>
> >>>>> Table creation Java program
> >>>>>
> >>>>> {noformat}
> >>>>> public class PutIntDataToHBase {
> >>>>> public static void main(String args[]) throws IOException {
> >>>>> Configuration conf = HBaseConfiguration.create();
> >>>>> conf.set("hbase.zookeeper.property.clientPort","5181");
> >>>>> HBaseAdmin admin = new HBaseAdmin(conf);
> >>>>> if (admin.tableExists("browser_action2")) {
> >>>>> admin.disableTable("browser_action2");
> >>>>> admin.deleteTable("browser_action2");
> >>>>> }
> >>>>>
> >>>>> byte[][] SPLIT_KEYS =
> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >>>>> HTableDescriptor tableDesc = new
> >>>>> HTableDescriptor(TableName.valueOf("browser_action2"));
> >>>>>
> >>>>> tableDesc.addFamily(new HColumnDescriptor("v"));
> >>>>> admin.createTable(tableDesc,SPLIT_KEYS);
> >>>>>
> >>>>> }
> >>>>> }
> >>>>> {noformat}
> >>>>>
> >>>>> Query plan for the query that was reported as returning wrong
> results.
> >>>>>
> >>>>> {noformat}
> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
&g

Re: query from hbase issue

2016-05-18 Thread Khurram Faraaz
So I tried to create the table using HBase API (with no data inserted into
table) and I got the query plan for drill 1.7.0
Drill 1.7.0-SNAPSHOT  commit ID :  09b26277

0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| row_key  | ANY| NO   |
| v| MAP| NO   |
+--++--+
2 rows selected (1.665 seconds)

Table creation Java program

{noformat}
public class PutIntDataToHBase {
public static void main(String args[]) throws IOException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.property.clientPort","5181");
HBaseAdmin admin = new HBaseAdmin(conf);
if (admin.tableExists("browser_action2")) {
admin.disableTable("browser_action2");
admin.deleteTable("browser_action2");
}

byte[][] SPLIT_KEYS =
{{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
HTableDescriptor tableDesc = new
HTableDescriptor(TableName.valueOf("browser_action2"));

tableDesc.addFamily(new HColumnDescriptor("v"));
admin.createTable(tableDesc,SPLIT_KEYS);

}
}
{noformat}

Query plan for the query that was reported as returning wrong results.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select
CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(k=[$0], p=[$1])
00-02UnionExchange
01-01  Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02HashAgg(group=[{0}], p=[$SUM0($1)])
01-03  Project($f0=[$0], p=[$1])
01-04HashToRandomExchange(dist0=[[$0]])
02-01  UnorderedMuxExchange
03-01Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02  HashAgg(group=[{0}], p=[COUNT($0)])
03-03Project($f0=[ITEM($1, 'e0')])
03-04  Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
stopRow=, filter=null], columns=[`*`]]])
{noformat}

and the query plan for the other problem query mentioned in the first email.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
. . . . . . . . . . . . . . > count(a.row_key) p from hbase.browser_action2
a group by
. . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(k=[$0], p=[$1])
00-02UnionExchange
01-01  Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02HashAgg(group=[{0}], p=[$SUM0($1)])
01-03  Project($f0=[$0], p=[$1])
01-04HashToRandomExchange(dist0=[[$0]])
02-01  UnorderedMuxExchange
03-01Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02  HashAgg(group=[{0}], p=[COUNT($1)])
03-03Project($f0=[BYTE_SUBSTR($0, 1, 9)],
row_key=[$0])
03-04  Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
{noformat}

Thanks,
Khurram

On Wed, May 18, 2016 at 7:01 AM, qiang li  wrote:

> Yes.
> I use hbase API to create it.
>
> The main code is:
>
> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, 
> {'7'},{'8'}, {'9'} };
> TableName tableName = TableName.valueOf("browser_action2");
>
> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> tableDesc.addFamily(columnDesc);
>
> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>
> admin.createTable(tableDesc, SPLIT_KEYS);
>
>
>
>
> 2016-05-18 1:48 GMT+08:00 Zelaine Fong :
>
>> Can you provide the CREATE TABLE statement you used to reproduce this
>> problem so we can try to reproduce it on our end.
>>
>> Thanks.
>>
>> -- Zelaine
>>
>> On Tue, May 17, 2016 at 4:50 AM, qiang li  wrote:
>>
>> > Hi ,
>> >
>> > I recently meet a issue that can not query the correct data from hbase
>> with
>> > sql by drill, can anybody help me.
>> >
>> > I test with the drill 1.6.
>> > My hbase scheme:
>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> > cf : v
>> > qualifier: v, e0, e1
>> >
>> > The wrong result only happened when I use group by clause.
>> >
>> > This sql will not return correct result:
>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, 

Re: Drill Issues

2016-05-18 Thread Khurram Faraaz
Hello Sanjiv,

I work with the Drill team. I don't have your previous email (I joined the
user group recently). I can take a look at your join query problem and your
multiple columns having same name and same type issue.

Please share/resend your earlier email, and I can take a look.
Two things (1) please share the explain plan for your join query problem,
if you can get one. (2) please share information from drillbit.log related
to your same name same type many columns issue.

Also I assume your Drillbits are setup on a Linux/UNIX environment.

Thanks,
Khurram

On Wed, May 18, 2016 at 11:37 AM, Sanjiv Kumar  wrote:

> Hello
>   I am facing some problem while using drill. I have also posted
> earlier my problem one by one, but didn't get any proper solution for that.
> This time again i am posting my problem.
>
> 1) Join Query Problem (for details check in April 16  with *subject*:-
> Regarding
> Join Query Problem)
>
> 2) Multiple Column having same name & same data type problem (for details
> check in April 16 *Subject*:- Multiple Column having same name & same data
> type problem)
>
> 3) Authentication in Window Operating System (for details check in April 16
> *SUBJECT*:- Regarding Drill Authentication)
>
>
> I hope  this time i get some proper solution from Drill.
>
>
>
>  ..
>   Thanks & Regards
>   *Sanjiv Kumar*
>


test

2016-05-17 Thread Khurram Faraaz
test email