Understanding Drill's timestamp and timezone

2015-05-07 Thread Hao Zhu
Hi Team, Recently spent some time to test the Drill's timestamp behavior, so sharing the article "Understanding Drill's timestamp and timezone ". This article tests the behaviors under different Drill's timezones and source d

Re: Query planning cost

2015-05-07 Thread Adam Gilmore
Yep - it's a tad confusing. As Jacques said, it's definitely running the scans in parallel, but it does seem pretty much linear. On Fri, May 8, 2015 at 10:44 AM, Ted Dunning wrote: > On Fri, May 8, 2015 at 12:30 AM, Adam Gilmore > wrote: > > > We're getting about a 350ms delay for 70 files, ab

Re: Query planning cost

2015-05-07 Thread Ted Dunning
On Fri, May 8, 2015 at 12:30 AM, Adam Gilmore wrote: > We're getting about a 350ms delay for 70 files, about 200ms for 35 files, > about 20-30ms for 1 file. > That is impressively linear. 25ms + files * 4.7 with only 5-10ms error. R^2 = 0.997

Re: "Illegal instant due to time zone offset transition"

2015-05-07 Thread Hao Zhu
Per my test, setting drill-override.conf like below can not work. It will add another configuration "drill.exec.user.timezone" which does not take effect. Adding -Duser.timezone=UTC to DRILL_JAVA_OPTS in drill-env.sh works. drill.exec: { cluster-id: “xyz", zk.connect: “abc:5181", *user.time

Re: Query planning cost

2015-05-07 Thread Adam Gilmore
I'll double check the debug logs. We're getting about a 350ms delay for 70 files, about 200ms for 35 files, about 20-30ms for 1 file. We're using HDFS. It does't appear that it's just saturating HDFS with reads, either. On Thu, May 7, 2015 at 8:30 PM, Jacques Nadeau wrote: > We log for Parque

Re: Query planning cost

2015-05-07 Thread Adam Gilmore
I'll double check the debug logs. We're getting about a 350ms delay for 70 files, about 200ms for 35 files, about 20-30ms for 1 file. We're using HDFS. It does't appear that it's just saturating HDFS with reads, either. Regards, *Adam Gilmore* Director of Technology a...@pharmadata.net.au

Re: Mongo query speed

2015-05-07 Thread AnilKumar B
Thanks for pointing this issue. We agree that BSON -> JSON String -> Drill Vector conversion could be a potential performance issue. When we started implementing mongo storage plugin, we thought of reusing JSON Reader rather than implementing parsing BSON. We will soon start working on BSON Record

Re: Unable to detect files in hadoop file system from apache drill 0.9.0

2015-05-07 Thread Venki Korukanti
Hi Sharath, Sorry for the trouble. Could you try after adding the following to your core-site.xml of HDFS on all nodes and restart? Replace with userName of the user who is running the Drillbit process. hadoop.proxyuser..hosts * hadoop.proxyuser..groups * This seems to be a regr

Unable to detect files in hadoop file system from apache drill 0.9.0

2015-05-07 Thread Sharath Akinapally
Hi, I recently moved drill from 0.8.0 to 0.9.0. Since then, I am unable to query on json files present in hadoop file system(hdfs). I had registered a storage plugin named hadoop in drill. I had all my files in /user/hadoop in hdfs. But from drill when I query "show files in hadoop.`/user/hadoo

Re: Query planning cost

2015-05-07 Thread Jacques Nadeau
We log for Parquet footer reading and block Map building. What are the reported times for each in your scenario? Are you on HDFS or MFS? Thx On May 7, 2015 10:47 AM, "Adam Gilmore" wrote: > Hey sorry my mistake - you're right. Didn't see it executing those in > TimedRunnables. I wonder why t

Re: Query planning cost

2015-05-07 Thread Adam Gilmore
Hey sorry my mistake - you're right. Didn't see it executing those in TimedRunnables. I wonder why then it's such a significant impact for only 70 files. I can pretty easily replicate it by using the globbing to select just a subset, then select the whole lot (i.e. 35 files takes about 200ms "pl