Re: Question about querying array fields in parquet files

2016-06-27 Thread Kunal Khatua
ost a set of the sample records as well? Thanks Kunal  Kunal Khatua Engineering [ MapR] [http://www.mapr.com/] www.mapr.com [http://www.mapr.com/] On Thu 23-Jun-2016 1:52:48 PM, David Kincaid <kincaid.d...@gmail.com> wrote: I'm very new to Drill and just learning how everything works. I had a questio

Re: Re: Hbase SQL

2016-06-20 Thread Kunal Khatua
of any variables, properties, etc... should have atleast one leading non-numeric character. You might try having the column name defined in UTF-8 for readability and ease in querying. Hope that helps. Kunal Khatua Engineering (Performance) [image: MapR] <http://www.mapr.com> www.ma

Re: UDF parameters

2017-01-23 Thread Kunal Khatua
Have you tried looking at the String manipulation function - CONCATE function as a reference to achieve your goal? Kunal Khatua From: Muhammad Gelbana <m.gelb...@gmail.com> Sent: Monday, January 23, 2017 1:18:21 AM To: user@drill.apache.org Subjec

Re: Discussion: Comments in Drill Views

2017-03-01 Thread Kunal Khatua
+1 I this this can be very useful. The only worry is of someone abusing it, so we probably should have a limit on the size of this? Not sure else it could be exposed and consumed. Kunal Khatua Engineering [MapR]<http://www.mapr.com/> www.mapr.com<http://www

Re: Explain Plan for Parquet data is taking a lot of timre

2017-03-01 Thread Kunal Khatua
increase parallelization and the utilization of all the cores by Drill. Kunal Khatua Engineering [MapR]<http://www.mapr.com/> www.mapr.com<http://www.mapr.com/> From: Jeena Vinod <jeena.vi...@oracle.com> Sent: Tuesday, February 28, 2017 12

Re: Speed up Drill

2016-09-02 Thread Kunal Khatua
If you are using the JDBC plugin to connect to SqlServer, which is a community storage plugin, performance might be sub-optimal.  If you are trying to do a table scan, you also have the possibility that the data being streamed is very large. Your terminal might be the bottleneck for having to

Re: Cannot load Parquet files created with parquet-cpp in Drill

2016-09-07 Thread Kunal Khatua
Hi Uwe I believe you're using the latest Apache Drill 1.8.0. From a quick look at the stack trace, it appears to be a potential bug on Drill's interpretation of dictionary encoded data.  One way to verify that your C++ implementation of Parquet is correct would be to have your generated data

Re: drill with hive snappy

2016-10-24 Thread Kunal Khatua
If you are specifically referring to snappy-compressed Parquet data, yes.  Drill can also leverage Hive's own libraries to consume Hive-generated data (for e.g. ORC format) that might not be directly consumed by Drill. On Mon 24-Oct-2016 9:54:49 AM, Rainday Chu(初雨)

Re: How to set output file size for parquet to csv conversions?

2016-11-16 Thread Kunal Khatua
Laura There isn't a way to limit the size of individual files. However, there might be a way you can get around this.  One option is to serially convert each Parquet file into a CSV file separately. Typically, a fragment within Drill will generate this CSV file. That might generate 12 CSV

Re: how to control the number of profiles on webui

2016-12-02 Thread Kunal Khatua
Currently, the value is hard-coded to a limit of 100. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java If you are building Drill from source, you could fork off the Drill repo and modify the value for

Re: Last Column showing blank in csv file

2016-12-02 Thread Kunal Khatua
There is a dos2unix utility for Linux that allows you to substitute the multichar newline with the single char newline. For Windows, you can use either a similar util on SourceForge or the CygUtils (part of the Cygwin shell, I believe) to achieve the same conversion.  In the meanwhile, like

Re: [ANNOUNCE] Apache Drill 1.10.0 Released

2017-03-23 Thread Kunal Khatua
Hi Bob Thanks for the positive acknowledgement of our efforts. We're also interested in understanding what reformatting (I'm guessing you mean code refactoring?) did you do for workarounds. As an open source project, we look forward to contributions from the community in solving problems

Re: Quoting queries

2017-03-30 Thread Kunal Khatua
What is the client machine from where you're issuing the query? Is it Windows? Drill 1.10 has a fix for Windows wildcard: https://issues.apache.org/jira/browse/DRILL-4812 Can you try with that? Kunal Khatua Engineering [1490734684477_mapr.png] www.mapr.com<http://www.mapr.

Re: JDBC disconnections over remote networks

2017-03-30 Thread Kunal Khatua
he server closes the connection because that thread unable to send heartbeats (blocked). This could result in the JDBC disconnection. You could suffix a LIMIT to the query to identify the 'sweet spot' where disconnections don't happen. Kunal Khatua From: rahul c

RE: Quoting queries

2017-03-31 Thread Kunal Khatua
ies BTW I tried with the 1.10 version on both sides (server and client) and I still get the same error. -Ursprüngliche Nachricht- Von: Kunal Khatua [mailto:kkha...@mapr.com] Gesendet: Donnerstag, 30. März 2017 20:04 An: user <user@drill.apache.org> Betreff: Re: Quoting queries Wh

RE: Struggling to run in distributed mode

2017-03-31 Thread Kunal Khatua
What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well? One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify

RE: Struggling to run in distributed mode

2017-03-31 Thread Kunal Khatua
ctions did not mention it? Also, why cannot my machine telnet to itself? Michael Knapp On 3/31/17, 3:05 PM, "Kunal Khatua" <kkha...@mapr.com> wrote: What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated

Re: Display of query result using command line

2017-03-16 Thread Kunal Khatua
The SQLLine shell is a CLI utility to allow people to explore big data.. but not designed (intentionally) to print 3 million rows! The utiity tries to format the data, and has limited internal buffers. You probably want to use a simple JDBC program to achieve the purpose of printing out the

Re: Struggling to run in distributed mode

2017-03-31 Thread Kunal Khatua
.log file, it has: host.name=ip-10-XXX-YYY-ZZZ..com. meaning it was not localhost. On 3/31/17, 4:36 PM, "Kunal Khatua" <kkha...@mapr.com> wrote: "DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode" I be

Re: Query on .gz.parquet files

2017-03-09 Thread Kunal Khatua
Is this a Parquet file? It looks more like a JSON document. What is the schema description published by the parquet-tools? From: PROJJWAL SAHA Sent: Thursday, March 9, 2017 4:36:06 AM To: user@drill.apache.org Subject: Query on .gz.parquet

Re: Discussion: Comments in Drill Views

2017-03-03 Thread Kunal Khatua
probably want something like a DESCRIBE VIEW ... to be enhanced to something like DESCRIBE VIEW WITH COMMENTARY ... A 1KB field is quite generous IMHO. That's more than 7 tweets to describe something ! [?] Kunal Khatua From: Ted Dunning <ted.dunn...@gmail.

Re: Configuring Drill Memory Usage under Windows

2017-03-07 Thread Kunal Khatua
I've not tried running Drill embedded on Windows, but you can try checking the parameters passed to the running Drill JVM to validate that the settings were picked up ? Kunal Khatua Engineering [MapR]<http://www.mapr.com/> www.mapr.com<http://www

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-07 Thread Kunal Khatua
- FileSelection.getStatuses() took 0 ms, numFiles: 1 More than 30 secs is unaccounted for. Can you turn on the root logger to be at the debug level and retry the explain plan? Kunal Khatua From: rahul challapalli <challapallira...@gmail.com> Sent: Tuesday, March 7, 2017 5:24

Re: querying S3 bucket root dir

2017-04-17 Thread Kunal Khatua
You'll need to log into https://issues.apache.org/jira/browse/ and click on red 'Create' button on the top to create an issue. Mark the Project as "Apache Drill" and Issue type as "Bug" From: Alon Principal Sent: Monday, April 17, 2017

Re: Windows system Initiation Error

2017-04-17 Thread Kunal Khatua
I am able to log into a remote Linux-based Drillbit using SQLLine with no issues. C:\drill\drill-1.10.0-SNAPSHOT\bin>sqlline.bat -u "jdbc:drill:zk=10.10.120.160:2181" DRILL_ARGS - " -u jdbc:drill:zk=10.10.120.160:2181" HADOOP_HOME not detected... HBASE_HOME not detected... Calculating Drill

Re: [Drill 1.10.0] : Memory was leaked by query

2017-04-18 Thread Kunal Khatua
Could you also share the profiles for the failed queries as well? Thanks Kunal From: Padma Penumarthy Sent: Tuesday, April 18, 2017 7:18:08 AM To: user@drill.apache.org Cc: d...@drill.apache.org Subject: Re: [Drill 1.10.0] : Memory was

Re: Struggling to run in distributed mode

2017-04-18 Thread Kunal Khatua
set my own host. >From my first email: I looked in the drillbit.log file, it has: host.name=ip-10-XXX-YYY-ZZZ..com. meaning it was not localhost. On 3/31/17, 4:36 PM, "Kunal Khatua" <kkha...@mapr.com> wrote: "DrillbitStartupException: Drillbit is

RE: Drill performance tuning parquet

2017-07-28 Thread Kunal Khatua
gt; c4.4xlarge - 4 20 20 20 20 > > > Dan Holmes | Revenue Analytics, Inc. > Direct: 770.859.1255 > www.revenueanalytics.com > > -Original Message- > From: Kunal Khatua [mailto:kkha...@mapr.com] > Sent: Friday, July 28, 2017 2:38 AM > To: user@dri

RE: delimiter in column values

2017-08-01 Thread Kunal Khatua
You could just try to have the headers in a single line too... emulating the structure that the rest of the data follows. -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Tuesday, August 01, 2017 9:38 PM To: user@drill.apache.org Subject: RE: delimiter in column

RE: delimiter in column values

2017-08-01 Thread Kunal Khatua
ta4" "coltwodata4" "-33.8724176" "151.2067579" "" "colonedata5" "coltwodata5" "" "" "" "This col6 data" "coltwodata6" "-33.869732" "151.203" "This col7 da

RE: pass parameters to Drill query in file

2017-08-02 Thread Kunal Khatua
If you just want to run the query with the parameters like "alter session", you can write those session-altering SQL as the initial set of queries before the actual query in the same file. Then run something like this: /opt /drill/apache-drill-1.11.0/bin/sqlline -u "jdbc:drill:zk=local"

RE: delimiter in column values

2017-08-02 Thread Kunal Khatua
your numbers will start as > VarChar. You can convert to a numeric type in the query. > > Example using your data: > > Column1,Column2,Column3,Column4,Column5 > colonedata1,coltwodata1,-35.924476,138.5987123, > colonedata2,coltwodata2,-27.4372536,153.0304583,137 > > Note that

RE: set up Apache Drill on Windows server in distributed mode

2017-08-10 Thread Kunal Khatua
rill:zk=NameNode,DataNode1,DataNode2 (The filename, directory name, or vol ume label syntax is incorrect) apache drill 1.11.0 "got drill?" sqlline> Note : Removed the actual IP for security purpose. Appreciate the help . Thanks, Divya On 11 August 2017 at 08:20, Kunal Khat

RE: set up Apache Drill on Windows server in distributed mode

2017-08-10 Thread Kunal Khatua
Most people have used Apache Drill on Windows primarily in Embedded mode because no one appears to have tried for more than 1 Drillbit. That said, you should be able to run Apache Drill in a distributed mode as well, since it is Java-based and would not need to rely on anything more than a

RE: Rest API - Need to Improve

2017-07-12 Thread Kunal Khatua
Should we look at some well-established products that have a good set of such APIs for guidance? That will ensure that we atleast identify the most relevant APIs. -Original Message- From: John Omernik [mailto:j...@omernik.com] Sent: Wednesday, July 12, 2017 6:12 AM To: user

RE: drill error connecting to Hbase

2017-07-23 Thread Kunal Khatua
This means that the connectivity with ZK appears to be working. What are the HBase, ZK and Hadoop versions that you are working with? I presume that the student table is otherwise accessible. -Original Message- From: Shai Shapira [mailto:shai.shap...@amdocs.com] Sent: Sunday, July 23,

RE: Drill performance tuning parquet

2017-07-27 Thread Kunal Khatua
You haven't specified what kind of query are you running. The Async Parquet Reader tuning should be more than sufficient in your usecase, since you seem to be only processing 3 files. The feature introduces a small fixed pool of threads that are responsible for the actual fetching of bytes

RE: Drill performance tuning parquet

2017-07-28 Thread Kunal Khatua
set (which means you will end up trading off the various parameters) and then scale out. (Scale out is not cheap). Thanks, Saurabh On Thu, Jul 27, 2017 at 1:37 PM, Kunal Khatua <kkha...@mapr.com> wrote: > You haven't specified what kind of query are you running. > > The As

RE: drill error connecting to Hbase

2017-07-26 Thread Kunal Khatua
connecting to Hbase It is CDH 5.8.2 I believe it is reliable versions, isn't it? Thanks, Shai -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Monday, July 24, 2017 8:50 AM To: user@drill.apache.org Subject: RE: drill error connecting to Hbase This means

Re: JDBC SQL parse error on CTAS

2017-04-24 Thread Kunal Khatua
I'm not sure if you can do a select * from (CTAS ..) The output of CTAS really is only a metric of the rows that got written by the various fragments. Are you able to run the query in SQLLine? The parse error does a pretty good job of pointing out where the query appears to have broken the

RE: delimiter in column values

2017-08-01 Thread Kunal Khatua
I think you need quotes around the single word datasets as well, because the quotes act as String delimiters and help in indicating the start and end of a String. Is there a reason why the single word strings cannot be in quotes as well? -Original Message- From: Divya Gehlot

RE: Drill JDBC connection on windows in embedded node

2017-08-07 Thread Kunal Khatua
+1 to Andries' comment. When you start Drill in Embedded mode, you are bringing up a single standalone Drillbit without any Zookeeper that would otherwise allow discovery of other Drillbits in the cluster. Starting in embedded mode is done via SQLLine because in a practical usage, a single

RE: unable to run drill sql in shell script

2017-08-07 Thread Kunal Khatua
You seem to be missing the jars in the classpath. Notice the line before the error... "Calculating Drill classpath" What you probably need to do is to navigate to your installation's bin directory and launch from there. -Original Message- From: Divya Gehlot

RE: pass parameters to Drill query in file

2017-08-07 Thread Kunal Khatua
, Divya On 3 August 2017 at 12:30, Kunal Khatua <kkha...@mapr.com> wrote: > If you just want to run the query with the parameters like "alter > session", you can write those session-altering SQL as the initial set > of queries before the actual query in the same file. &g

RE: delimiter in column values

2017-08-07 Thread Kunal Khatua
. There are table functions in Drill that guide it with additional inputs on how to manage the preparation of the table. Can you please share the link ? Thanks, Divya On 3 August 2017 at 21:06, Kunal Khatua <kkha...@mapr.com> wrote: > A couple of things... > > 1. Your deli

RE: Unable to SELECT from parquet file with Hadoop 2.7.4

2017-08-18 Thread Kunal Khatua
Interesting. I'm presuming this works if the parquet file is in a directory, right? Was Drill built with Hadoop 2.7.4 dependencies or did you use the default 2.7.1 that is there in the POM.XML ? A workaround for now would be to query on an enclosing directory, until someone looks at the issue

RE: Unable to SELECT from parquet file with Hadoop 2.7.4

2017-08-18 Thread Kunal Khatua
Kunal -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Friday, August 18, 2017 1:07 PM To: user@drill.apache.org Subject: RE: Unable to SELECT from parquet file with Hadoop 2.7.4 Glad that worked ! Not sure why the APIs in 2.7.4+ are not backward compatible

RE: Unable to SELECT from parquet file with Hadoop 2.7.4

2017-08-18 Thread Kunal Khatua
with Drill 1.11 (with -Dhadoop.version=2.7.4) Hadoop 2.8.0 with Drill 1.11 (default pom.xml) Hadoop 3.0.0-alpha4 with Drill 1.11 (default pom.xml) Kind regards, Michele On Fri, Aug 18, 2017 at 8:06 PM, Kunal Khatua <kkha...@mapr.com> wrote: > Interesting. I'm presuming this works if th

RE: Merge and save parquet files in Drill

2017-08-18 Thread Kunal Khatua
If you are creating lot of small files within each partition, that is because different writers write a file for each partition. One workaround would be to reduce the number of writer fragments. We can achieve that by setting the parameter to a lower value, so that you generate larger files.

Re: multiple users and passwords

2017-05-02 Thread Kunal Khatua
Have you had a look at this link? https://drill.apache.org/docs/configuring-user-authentication Configuring User Authentication - Apache Drill drill.apache.org Authentication is the process of establishing confidence of

Re: Drill query are stuck in ENQUEUED mode

2017-05-03 Thread Kunal Khatua
leted. So you should be able to find that easily. Kunal Khatua Engineering [1490734684477_mapr.png] www.mapr.com<http://www.mapr.com/> From: jasbir.s...@accenture.com <jasbir.s...@accenture.com> Sent: Wednesday, May 3, 2017 9:37:44 AM To: user@dr

Re: Quoting queries

2017-05-03 Thread Kunal Khatua
üngliche Nachricht- Von: Kunal Khatua [mailto:kkha...@mapr.com] Gesendet: Dienstag, 2. Mai 2017 20:02 An: user@drill.apache.org Betreff: Re: Quoting queries Hi David I tried the queries on SQuirreL and everything following the asterisk in the 'table' name is truncated when the query is submitte

Re: In-memory cache in Drill

2017-05-10 Thread Kunal Khatua
Drill does not cache data in memory because it introduces the risk of dealing with stale data when working with data at a large scale. If you want to avoid hitting the actual storage repeatedly, one option is to use the 'create temp table ' feature

Re: In-memory cache in Drill

2017-05-10 Thread Kunal Khatua
e also used as a storage for the temporary tables. Sincerely, Michael Shtelma On Wed, May 10, 2017 at 6:30 PM, Kunal Khatua <kkha...@mapr.com> wrote: > Drill does not cache data in memory because it introduces the risk of dealing > with stale data when working with data at a large scale.

Re: In-memory cache in Drill

2017-05-10 Thread Kunal Khatua
sed as a storage for the temporary > tables. > Sincerely, > Michael Shtelma > > > On Wed, May 10, 2017 at 6:30 PM, Kunal Khatua <kkha...@mapr.com> wrote: > > Drill does not cache data in memory because it introduces the risk of > dealing with stale data when working w

Re: Not Able to suscribe

2017-05-15 Thread Kunal Khatua
Did this not work? https://drill.apache.org/mailinglists/ From: Arvind Sharma Sent: Monday, May 15, 2017 12:00:33 AM To: user@drill.apache.org Subject: Not Able to suscribe

Re: Drill embedded mode

2017-05-15 Thread Kunal Khatua
Is there a reason you want to use Drill in Embedded mode and not as a standalone server? You can always use a screen session and start up Drill embedded mode and then detach that session. From: Selvarajan Thangavel

Re: Run Multiple Queries as a Single query Apache Drill

2017-05-15 Thread Kunal Khatua
You could try to set the value through the OPTIONS tab on the top right. This will be a permanent 'system' level setting and not a temporary 'session'-level setting. You can have your API also do this using the 'alter system set ...' to achieve the same end goal.

RE: Apache Drill C++ Client Binary Versioning Information

2017-06-23 Thread Kunal Khatua
I think you missed sharing the Apache JIRA :) From: Robert Wu [mailto:r...@magnitude.com] Sent: Thursday, June 22, 2017 5:19 PM To: user@drill.apache.org Subject: RE: Apache Drill C++ Client Binary Versioning Information Oh sorry, looks like attachment is not going through. I've captured the

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-05-31 Thread Kunal Khatua
lds that use dictionary encoding Thank you Kunal. Kan you please explain to me why min/max values would be relevant for dictionary encoded fields? (I think I may be completely misunderstanding how they work) Regards, -Stefán On Wed, May 31, 2017 at 5:55 PM, Kunal Khatua <kkha...@mapr.com&

Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

2017-06-02 Thread Kunal Khatua
Hi Jasbir I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through). In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime Taking your mouse over those table headers should show you a

Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

2017-06-04 Thread Kunal Khatua
is there any way by which we can try reducing it? Regards, Jasbir singh -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Saturday, June 03, 2017 12:00 AM To: user@drill.apache.org; d...@drill.apache.org Cc: Kothari, Maneesh <maneesh.koth...@accenture.com&

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-05-31 Thread Kunal Khatua
Even though filter pushdown is supported in Drill, it is limited to pushing down of numeric values including dates. We do not support pushdown of varchar because of this bug in the parquet library: https://issues.apache.org/jira/browse/PARQUET-686 The issue of

Re: Column alias are ignored when Storage Plugin is enabled

2017-06-08 Thread Kunal Khatua
It could be related to these as well: https://issues.apache.org/jira/browse/DRILL-5537 https://issues.apache.org/jira/browse/DRILL-5538 Please go ahead and file a bug. If it is related, they'll be linked and resolved together. ~ Kunal From: Rahul Raj

Re: Increasing store.parquet.block-size

2017-06-09 Thread Kunal Khatua
Shuporno There are some interesting problems when using Parquet files > 2GB on HDFS. If I'm not mistaken, the HDFS APIs that allow you to read offsets (oddly enough) returns an int value. Large Parquet blocksize also means you'll end up having the file span across multiple HDFS blocks, and

RE: Apache Drill Question

2017-06-15 Thread Kunal Khatua
Not familiar with SSHFS or GlusterFS specs, but It should, in theory, work out of the box. You can start off Drill with having the underlying storage plugins talk to a localFS. I'm presuming SSHFS / GlusterFS can expose the files through a local NFS-like mount. However, if your three nodes

RE: SIGSEGV error - StubRoutines::jlong_disjoint_arraycopy

2017-06-15 Thread Kunal Khatua
Nope.. not seen this before. Can you share more details of the log messages, etc? The problem might have to do with the JSON files being very large... because the segmentation fault that triggered the JVM (Drillbit) crash hints at that during the write of the Parquet files. I take it you

Re: SIGSEGV error - StubRoutines::jlong_disjoint_arraycopy

2017-06-15 Thread Kunal Khatua
error - StubRoutines::jlong_disjoint_arraycopy Yeah. It only crashes on the larger JSON files. Reworking my python script to use hdfs.tmp instead of dfs.tmp now.. -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Thursday, June 15, 2017 10:52 AM To: user@drill.apache.org S

RE: Apache Drill C++ Client Binary Versioning Information

2017-06-16 Thread Kunal Khatua
I don't think the mailing lists allows attachments to go through. Can you share via some free online utility like https://imgur.com/upload From: Robert Wu [mailto:r...@magnitude.com] Sent: Tuesday, June 13, 2017 12:07 PM To: user@drill.apache.org Subject: Apache Drill C++ Client Binary

RE: Using Apache Drill with AirBnB SuperSet

2017-06-14 Thread Kunal Khatua
The Superset project looks pretty neat. Didn't realize that there is also a Python Driver for Drill. I'd think that would be useful too. -Original Message- From: John Omernik [mailto:j...@omernik.com] Sent: Wednesday, June 14, 2017 3:45 AM To: user Subject:

RE: JDBC help

2017-06-14 Thread Kunal Khatua
You need to use the org.apache.drill.jdbc.Driver class. The drill-jdbc-all dependency should be enough. Are you trying to connect to an HDFS cluster's zookeeper? -Original Message- From: Aspen Hsu [mailto:a...@xactlycorp.com] Sent: Wednesday, June 14, 2017 12:07 PM To:

Re: Increasing store.parquet.block-size

2017-06-09 Thread Kunal Khatua
n HDFS. The size of the individual .csv source files can be quite huge (around 10GB). So, is there a way to overcome this and create one parquet file or do I have to go ahead with multiple parquet files? On 09-Jun-2017 11:04 PM, "Kunal Khatua" <kkha...@mapr.com> wrote: > Shuporno > &g

Re: Increasing store.parquet.block-size

2017-06-09 Thread Kunal Khatua
ptimal, if I have to read the file later? On 09-Jun-2017 11:28 PM, "Kunal Khatua" <kkha...@mapr.com> wrote: > > If you're storing this in S3... you might want to selectively read the > files as well. > > > I'm only speculating, but if you want to download the

Re: Drill query are stuck in ENQUEUED mode

2017-05-04 Thread Kunal Khatua
ux box Please do let me know how to resolve query stuck issue as it is hampering products performance. Regards, Jasbir Singh -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Wednesday, May 03, 2017 10:43 PM To: user@drill.apache.org Cc: Kothari, Maneesh <man

Re: error in mac OS X driver installer instructions

2017-05-19 Thread Kunal Khatua
Looks like you're right about the typo. We'll have it corrected. Thanks, Jonathan! ~ Kunal From: Jonathan Snyder Sent: Friday, May 19, 2017 1:30:17 PM To: user@drill.apache.org Subject: error in mac OS X

RE: Spark exception crashing application

2017-09-18 Thread Kunal Khatua
The exceptions/error (NoClassDefFoundError) appear to indicate that there is a possible mismatch between the library versions of Spark within Drill and the platform you are running on.. Can you start by verifying the versions of the Spark libraries you have and then try to build Drill (edit

RE: Apache DRILL v1.11.0 handling Postgres citext columns with Inconsistency

2017-09-18 Thread Kunal Khatua
It's odd that adding just a term_count column is causing an error but the other 2 columns (created, updated) don't seem to be... and gets resolved on removing the cast. Can you provide the stack trace and error message? Also, what are the data types for the other columns? -Original

Re: User client timeout with results > 2M rows

2017-09-20 Thread Kunal Khatua
Do you know in how much time does this timeout occur? There might be some tuning needed to increase a timeout. Also, I think this (S3 specifically) has been seen before... So you might find a solution within the mailing list archives. Did you try looking there? From: Alan Höng Sent:

RE: User client timeout with results > 2M rows

2017-09-20 Thread Kunal Khatua
> 2M rows Yes it takes about 2-3min for the timeout to appear the query itself should finish in that time. The files are not that big for debugging. I have, but I couldn't find anything relevant or helpful in my situation so far. On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kkha...@mapr.com&

RE: User client timeout with results > 2M rows

2017-09-21 Thread Kunal Khatua
utor.java:357) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadE > ventExecutor.java:

RE: Querying Parquet files in 2.0 format

2017-09-21 Thread Kunal Khatua
Drill currently is supporting Parquet 1.8.1 https://github.com/apache/drill/blob/master/pom.xml#L37 AFAIK a lot of major projects are still on Parquet 1.x versions. Is there something specific in 2.0 that you need? Else, Parquet 1.8.1 should suffice. You needn’t downgrade all the way down to

RE: Exception while reading parquet data

2017-10-15 Thread Kunal Khatua
.0.jar:1.11.0] > > at org.apache.drill.exec.store.parquet.columnreaders. > > PageReader.readPage(PageReader.java:216) > > ~[drill-java-exec-1.11.0.jar:1.11.0] > > at org.apache.drill.exec.store.parquet.columnreaders. > > PageReader.nextInternal(PageReader

RE: Drill Profile page takes too much time to load

2017-08-29 Thread Kunal Khatua
This might be because that installation is possibly hosting a lot of Drill profiles. Do you know how many profiles you have residing in the underlying persistent store? When a query is executed, the Foreman Drillbit (i.e. the node from where the query is submitted) writes the final profile to

Re: need help to decrypt error code

2017-09-28 Thread Kunal Khatua
This appears to be the brief client side error. What does the Drillbit 's log show in the stacktrace? From: Divya Gehlot Sent: Thursday, September 28, 2017 12:43:38 AM To: user@drill.apache.org Subject: need help to decrypt error code

RE: Error Messages that are difficult to parse.

2017-09-25 Thread Kunal Khatua
Not sure what is going on, but my hunch is that the outermost wrapping SQL is probably using the final projections to eliminate some of the columns early on, which "helps" avoid the NumberFormat exception. Perhaps adding back the other columns, one by one, should narrow down the source of the

RE: Drill selects column with the same name of a different table

2017-08-21 Thread Kunal Khatua
Could you share the profile ( *.sys.drill file or the http://:8047/profiles/.json ) ? This might be a bug with the JDBC Storage plugin. A quick way to validate this would be to have the similar data as 2 text/parquet tables and have Drill read from that. If we don't see an issue, then it is

RE: Problems using Postgres datasource

2017-09-01 Thread Kunal Khatua
I'm not very familiar with the details of Postgres, but I do so see people occassionally asking about it Have you checked the mailing list archives? You might find your answers there. -Original Message- From: Gonzalo Ortiz Jaureguizar [mailto:golthir...@gmail.com] Sent: Friday,

RE: Access to Drill 1.9.0

2017-10-06 Thread Kunal Khatua
Just curious... any reason why you're looking to try Drill 1.9.0, considering that is nearly a year old ? -Original Message- From: Rob Wu [mailto:robw...@gmail.com] Sent: Friday, October 06, 2017 10:35 PM To: user@drill.apache.org Subject: Re: Access to Drill 1.9.0 Hi Chetan, You can

RE: Exception while reading parquet data

2017-10-11 Thread Kunal Khatua
If this resolves the issue, could you share some additional details, such as the metadata of the Parquet files, the OS, etc.? Details describing the setup is also very helpful in identifying what could be the cause of the error. We had observed some similar DATA_READ errors in the early

FW: Drill Push to Tableau, Error -

2017-12-17 Thread Kunal Khatua
Forwarded from d...@drill.apache.org From: Spinn, Brandi [mailto:brandi.sp...@siriusxm.com] Sent: Friday, December 15, 2017 1:46 PM To: d...@drill.apache.org Subject: Drill Push to Tableau, Error - Hello, We are currently running a project which is utilizing the Drill push to Tableau function

RE: Drill Push to Tableau, Error -

2017-12-17 Thread Kunal Khatua
. Hope that helps. Let us know how else you are using Drill. Thanks ~ Kunal -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Sunday, December 17, 2017 9:30 AM To: user@drill.apache.org Subject: FW: Drill Push to Tableau, Error - Forwarded from d

RE: Drill session and jdbc connections

2017-12-13 Thread Kunal Khatua
the store format. Thanks for your inputs, Regards, Rahul On Wed, Dec 13, 2017 at 10:59 PM, Kunal Khatua <kkha...@mapr.com> wrote: > A Drill session is isolated and bound to a connection. Your > 'getConnection()' method might be fetching connections from a pool, > where the settin

RE: Drill session and jdbc connections

2017-12-13 Thread Kunal Khatua
A Drill session is isolated and bound to a connection. Your 'getConnection()' method might be fetching connections from a pool, where the settings haven't been reset. If the connections are shared, you will continue to have this problem. If you are returning a connection back to the pool, run

RE: sqlline parquet to tsv filesize imabalance causing slow sqoop export to MS sql server

2017-11-16 Thread Kunal Khatua
It might be that your parallelization is causing it to generate 4 files, where only <= 3 files are sufficient. Try experimenting with the planner. width .max_per_query to a value of 3 ... that might help. https://drill.apache.org/docs/configuration-options-introduction/ -Original

RE: Drill Capacity

2017-11-02 Thread Kunal Khatua
Hi Yun Andries solution should address your problem. However, do understand that, unlike CSV files, a JSON file cannot be processed in parallel, because there is no clear record delimiter (CSV data usually has a new-line character to indicate the end of a record). So, the larger a file gets,

RE: Drill Capacity

2017-11-07 Thread Kunal Khatua
Hi Yun The new release might not address this issue as we don't have a repro for this. Any chance you can provide a sample anonymized data set. The JSON data doesn't have to be meaningful, but we need to be able to reproduce it to ensure that we are indeed addressing the issue you faced.

RE: Formatting results

2017-11-04 Thread Kunal Khatua
This should do it: https://drill.apache.org/docs/data-type-conversion/#to_char-syntax -Original Message- From: Brandon [mailto:etu...@gmail.com] Sent: Saturday, November 04, 2017 5:13 AM To: user@drill.apache.org Subject: Formatting results Is it possible to format query results, like

Re: [1.9.0] : UserException: SYSTEM ERROR: IllegalReferenceCountException: refCnt: 0 and then SYSTEM ERROR: IOException: Failed to shutdown streamer

2017-12-07 Thread Kunal Khatua
What is it that you were trying to do when you encountered this? This is a system error and the message appears to hint that Drill shutdown a prematurely and is unable to account for that Kunal From: Anup Tiwari Sent: Wednesday, December 6, 7:46 PM Subject: Re: [1.9.0] : UserException: SYSTEM

RE: [1.9.0] : UserException: SYSTEM ERROR: IllegalReferenceCountException: refCnt: 0 and then SYSTEM ERROR: IOException: Failed to shutdown streamer

2017-12-11 Thread Kunal Khatua
in trail mail also as you have mentioned :- *This is a system error and the message appears to hint that Drill shutdown a prematurely , *I have checked on all nodes and drill-bit is running properly. Note :- We are using Drill 1.10.0. Regards, *Anup Tiwari* On Thu, Dec 7, 2017 at 10:33 PM, Kunal

RE: Regarding where 1=0 clause

2017-10-24 Thread Kunal Khatua
Could you file a JIRA for this? It seems trivial enough to fix. -Original Message- From: Amit Garg [mailto:fromami...@gmail.com] Sent: Monday, October 23, 2017 11:21 PM To: user@drill.apache.org Subject: Regarding where 1=0 clause I am using Apache Drill JDBC storage plugin. When I

  1   2   3   >