Re: Check presence of field in json file

2019-09-16 Thread Boaz Ben-Zvi
 Hi Sebastian,     Setting the option "store.json.all_text_mode" to "true" should make the Json reader interpret all the null valued or missing columns/values as "text" instead of "int" (should return the text value 'null' ). This may be another workaround to the type guessing problem that

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-22 Thread Boaz Ben-Zvi
   Congratulations Charles !  Please ask us for any help needed with your new role and duties ...    Boaz On 8/22/19 10:30 AM, Charles Givre wrote: Thanks eveyrone! Arina, you've done a great job and you've left me with some very big shoes to fill. I'll do my best! -- C On Aug 22,

Apache Drill Hangout - June 25, 2019

2019-06-24 Thread Boaz Ben-Zvi
Hi Drillers, Our bi-weekly hangout is scheduled for tomorrow, Tuesday, June 25th, at 10 AM PST (link:https://meet.google.com/yki-iqdf-tai ). Please suggest any topics you would like to discuss during the hangout by replying to this email. Thanks,

Re: [ANNOUNCE] New PMC member: Sorabh Hamirwasia

2019-04-05 Thread Boaz Ben-Zvi
Congratulation Sorabh - welcome to the Project Management Committee !! On Fri, Apr 5, 2019 at 10:58 AM Abhishek Ravi wrote: > Congratulations Sorabh! Well deserved! > > On Fri, Apr 5, 2019 at 10:49 AM hanu mapr wrote: > > > Congratulations, Sorabh! > > > > On Fri, Apr 5, 2019 at 10:30 AM

Re: [DISCUSS] Including Features that Need Regular Updating?

2019-03-22 Thread Boaz Ben-Zvi
 Hi Charles,     If these updates are only small simple tasks, it would not be a big issue to add them to the Drill Release Process (see [1]). BTW, most of the release work is automated via a script (see section 4 in [1]); so if these updates could be automated as well, it would be a

Re: RESOURCE ERROR: External Sort encountered an error while spilling to disk

2019-03-12 Thread Boaz Ben-Zvi
 Hi Giovanni,     The error given by the External-Sort indicates a problem while spilling the excess memory into disk. When you enlarged the memory (from the default 2GB) to 8GB the LAG query may succeeded without spilling, hence circumvented the issue. Yes, you can keep enlarging the

Re: Big varchar are ok when extractHeader=false but not when extractHeader=true

2019-01-23 Thread Boaz Ben-Zvi
 Hi Benj,    Testing with the latest code, this error does not show: 0: jdbc:drill:zk=local> select columns[0], columns[1] from dfs.`/data/bar.csv`; +--+-+      | EXPR$0  

Re: Big varchar are ok when extractHeader=false but not when extractHeader=true

2019-01-23 Thread Boaz Ben-Zvi
 Hi Benj,    Testing with the latest code, this error does not show: 0: jdbc:drill:zk=local> select columns[0], columns[1] from dfs.`/data/bar.csv`; +--+-+      | EXPR$0

Apache Drill Hangout - 22 Jan, 2019

2019-01-21 Thread Boaz Ben-Zvi
 Hi Drillers,    The bi-weekly Apache Drill hangout is scheduled for tomorrow, Tuesday, Jan 22nd, at 10 AM PST. The original plan was for Arina to talk about Schema Provisioning.If there are any other topics or questions, feel free to reply or raise during the hangout. The hangout link:  Meet

Re: Error: SYSTEM ERROR: NumberFormatException: empty String

2018-08-29 Thread Boaz Ben-Zvi
 Hi Eduardo, Did you try to attach an image to you message ?  The Apache list does not support attachments, so please describe the error. And the Alter System seems to work fine (unless you don't have admin permissions - may try "ALTER SESSION" instead): 0: jdbc:drill:zk=local> ALTER

Re: CTAS memory leak

2018-08-29 Thread Boaz Ben-Zvi
 Hi Scott, 1.  "swaps and then crashes" - do you mean an Out-Of-Memory error ? 2. Version 1.14 is available now, with several memory control improvements (e.g., Hash Join spilling, output batch sizing) 3. Direct memory is only 10G - why not go higher ? This is where most of Drill's

[ANNOUNCE] Apache Drill Release 1.14.0

2018-08-05 Thread Boaz Ben-Zvi
On behalf of the Apache Drill community, I am happy to announce the release of Apache Drill 1.14.0. For information about Apache Drill, and to get involved, visit the project website [1]. This release of Drill provides the following many new features and improvements:

Hangout Summary - July 24 (Re: Drill Hangout tomorrow at 10 am PST)

2018-07-25 Thread Boaz Ben-Zvi
are not getting well tested like the main Drill code. A solution suggested - put these under either a new directory, or under /contrib.    Thanks, Boaz On 7/23/18 6:16 PM, Boaz Ben-Zvi wrote:     The bi-weekly Drill Hangout shall take place tomorrow July 24th at 10 am PDT Any

Drill Hangout tomorrow at 10 am PST

2018-07-23 Thread Boaz Ben-Zvi
    The bi-weekly Drill Hangout shall take place tomorrow July 24th at 10 am PDT Any discussion topics are welcome (I'm currently busy with the 1.14 RC; I could say a word or two about that) The Hangout link: https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc  

Re: Apache Drill.

2018-03-23 Thread Boaz Ben-Zvi
Hi Robert, Unfortunately the Apache users list does not support attachments, so need to find another way to make it available. Was there a stack trace along with the java.lang.NullPointerException error ? Thanks, Boaz From: Robert Smith

Re: Accessing underlying scheme of input

2018-03-01 Thread Boaz Ben-Zvi
From the docs (https://drill.apache.org/docs/describe/): “Currently, DESCRIBE does not support tables created in a file system.” Seems that it only works well for Hive and HBase tables. The create view statement does not explore the actual schema of the query’s table(s); it only parses and

Memory Spilling for the Hash Join Operator

2017-11-28 Thread Boaz Ben-Zvi
Hi Drill developers, The following is a link to the first draft of the design for memory spilling for the Hash Join Operator: https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit?usp=sharing The document should allow commenting, so please add any comment

Re: Drill Capacity

2017-11-02 Thread Boaz Ben-Zvi
Hi Yun, Can you try using the “managed” version of the external sort – either change this option to false: 0: jdbc:drill:zk=local> select * from sys.options where name like '%man%';

[HANGOUT] minutes for 9/5/2017

2017-09-05 Thread Boaz Ben-Zvi
Apache Drill Hangout – Sept. 5th, 2017. (Note: There would be no Drill Hangout on Sept 19 !) Attendees: Sorabh, Sindhu, Padma, Hanumath, Arina, Vitalii, Volodymyr, Vova, Pritesh, Aman, Vlad, Boaz The main discussion topic was the forthcoming Drill Developers Day on Sept. 18. (See Aman’s

[HANGOUT] Topics for 9/5/2017

2017-09-04 Thread Boaz Ben-Zvi
We shall have a Drill hangout tomorrow (Tuesday Sept 5) at 10 AM Pacific. Please suggest any topics by replying to this thread or bring them up during the hangout. Hangout link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc Thanks, Boaz

Re: how to use LIKE operator with a binary column

2017-04-28 Thread Boaz Ben-Zvi
Hi Franca, This issue is specific to the “bytes” type; for other Avro types the LIKE clause matches the printed representation, like: select * from dfs.`/data/avro/twitter.snappy.avro` where `timestamp` like '%66%'; +-+--+-+ |

Re: Inequality join error with date range and calendar table

2017-03-20 Thread Boaz Ben-Zvi
The hash join does not support inequality yet. Try nested loop join by setting: alter system set `planner.enable_nljoin_for_scalar_only` = false; And change the LEFT to a regular join followed by an inequality predicate: ``` SELECT m.startdate as monthdate, COUNT(distinct

Re: Error while applying interval on a postgresql query

2017-03-20 Thread Boaz Ben-Zvi
What do you mean by ‘postrgresql table’ ? I just retried the query below (on a json table) and it worked OK: 0: jdbc:drill:zk=local> select * from test2 limit 2; +-+-+ | id | start_date | +-+-+ | 1 | 1997-10-27 | | 2 | 1997-10-27 |

Re: Spill Location, permissions and Authentication

2017-03-01 Thread Boaz Ben-Zvi
The permissions on the spill locations are s part of the setup configuration — these base locations should have permissions matching the OS userid used for the DrillBit . The DrillBit creates those directories (with that long name 274a41…….) with its userid. So in the example below, the user

Re: debugging bad input

2017-03-01 Thread Boaz Ben-Zvi
This issue has been raised before; currently as the execution detects the bad format, it has no knowledge which file the data came from. See https://issues.apache.org/jira/browse/DRILL-3764 There is a link to a design that could tell the name of the file, by adding this information to the

Re: Best architecture

2017-02-21 Thread Boaz Ben-Zvi
Hi Users, Drill CAN use the disk when running out of memory (a.k.a. spill to disk). Currently only the Sort operation is supported, hence you’d need to enforce a Merge Join for joining, or a Streaming Aggregation for aggregating. But we are currently working on expanding this functionality

Re: Drill: Memory Spilling for the Hash Aggregate Operator

2017-01-16 Thread Boaz Ben-Zvi
rite to memory, and the cost of discovering that is wrong is not too great. With Goetz Graefe's hybrid hash join (which can be easily adapted to hybrid hash aggregate) if the input ALMOST fits in memory you could process most of it in memory, then revisit the stuff you spilled to disk. > On Jan

Re: Drill: Memory Spilling for the Hash Aggregate Operator

2017-01-16 Thread Boaz Ben-Zvi
) if the input > ALMOST fits in memory you could process most of it in memory, then revisit > the stuff you spilled to disk. > >> On Jan 13, 2017, at 7:46 PM, Boaz Ben-Zvi <bben-...@mapr.com> wrote: >> >> Hi Drill developers, >> >> Attached is a