Hangout link?

2016-05-31 Thread John Omernik



Re: Hangout link?

2016-05-31 Thread Abdel Hakim Deneche
Sorry about the delay, there you go:

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

On Tue, May 31, 2016 at 9:57 AM, John Omernik  wrote:

>
>


-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Clarification on Drill Options

2016-05-31 Thread John Omernik
I added a JIRA related to this:

https://issues.apache.org/jira/browse/DRILL-4699

On Sun, May 29, 2016 at 6:55 AM, John Omernik  wrote:

> Hey all, when looking at the drill options, and specifically as I was
> trying to understand the parquet options, I realized that the naming of the
> options was forming "question" as I looked at them. What do I mean?
> Consider:
>
> ++
>
> |name|
>
> ++
>
> | store.parquet.block-size   |
>
> | store.parquet.compression  |
>
> | store.parquet.dictionary.page-size |
>
> | store.parquet.enable_dictionary_encoding   |
>
> | store.parquet.page-size|
>
> | store.parquet.use_new_reader   |
>
> | store.parquet.vector_fill_check_threshold  |
>
> | store.parquet.vector_fill_threshold|
>
> ++
>
>
>
> So I will remove "store.parquet" as I refer to them here:
>
>
> use_new_reader - This seems fairly obvious an "on read" options and
> (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
> is likely ONLY an on write option correct? I mean, if the Parquet file
> was written somewhere else, and written with Dictionary encoding, Drill
> will still read it ok, regardless of this setting. Compression as well, if
> the Parquet file was created with gzip, and this setting is snappy, it will
> still read it, same goes for block size. Thus, those seem to be "writer"
> settings, rather than reader settings.
>
>
> So what about the vector settings? Write or Read (or both?) For json there
> is this setting: | store.json.writer.uglifywhich seems to be writer
> focused and obviously writer, but for other settings, knowing what the
> setting applies to, on write, on read, neither, or both, could be very
> useful for troubleshooting and knowing which settings to play with.
>
>
> Now, changing these settings as they are is not recommended, even in my
> test clusters, I have scripts that alter them for specific ETLs, and I
> would hate to have things break, but how hard would it be to add a string
> column to sys.options something like "applies_to" with write, read, both,
> neither, n/a as options?   I think this could be valuable for users and
> administrators of Drill.
>
>
> One other note, in addition to the applies_to,  would it be horrifically
> difficult to add a  "description" field for options?  Self documenting
> settings sure would be handy  :)
>
>
> John
>
>
>


Minutes from 5/31/16 Drill Hangout

2016-05-31 Thread Zelaine Fong
Attendees: Arina, John O, Subhu, Vitalii, Hakim, Parth, Aman, Paul,
Jinfeng, Aditya, Zelaine

1) John noted that he's hitting a lot of problems with Drill scripts.  Paul
indicated that he's fixing a lot of these issues as part of his work to
integrate Drill with YARN.  John said he's writing up a document outlining
his findings that he will share with the community in a few days.

2) John suggested adding a new description field to sys.options as a way of
documenting the different system configuration options, as he's been
struggling to make sense of some of the options.  He's logged DRILL-4699
for this enhancement.  During the discussion, it was also noted that Drill
isn't being consistent in the naming convention of some of the current
options.  From a backward compatibility standpoint, it may not be possible
to rename existing options, but for the future, we should publish some
guidelines to ensure future consistency.

3) Aditya has a pull request for DRILL-4199 to upgrade to HBase 1.1.3.
Jacques has reviewed the change, but Aditya would like another pair of eyes
on the change.  Aditya will reach out to Steven to see if he can take a
look at the change.  He'll also reach out to QA to ensure this is on the QA
radar.

4) Arina is making updates to DRILL-4571, based on feedback from Krystal.
Since DRILL-4571 is already marked resolved, she asked whether she should
open a new Jira or reuse the existing one.  The suggestion was to open a
new one and link the new one back to DRILL-4571.

5) Vitalii is working on DRILL-3510 to make double quotes an alternative to
backtick.  Due to audio difficulties, we couldn't discern his specific
question.  So, it was suggested that he post his question on the dev list.

6) John is encountering a problem where garbage collection is putting his
cluster into a bad state.  He asked whether he should open a ticket with
MapR support or continue to seek out help from the community.  It was
suggested that he do both.


Profiles Gone in Web UI: The great profile heist

2016-05-31 Thread John Omernik
I am scratching my head at this one... I made some minor changes to my
drill-env.sh to enable gclogging, and was using the profiles in the webui
just fine.  Due to some previously mentioned issues, I've had to restart
drill bits due to GC issues etc.

Now, while my profiles directory still exists, and my drill-override.conf
has not been changed, no profiles now show up in the webui, even after
drillbit restarts, and running more queries... The profiles are still being
created (I can see them being added to the same profiles directory) just
nothing shows up in the Web UI...

What could be happening here?

*scratching my head


Re: Oracle Query Problem

2016-05-31 Thread Sudheesh Katkam
Can you enable verbose logging and post the resulting error message? You can do 
this by executing the following statement, and then the failing query.

SET `exec.errors.verbose` = true;

Thank you,
Sudheesh

> On May 31, 2016, at 11:27 AM, SanjiV SwaraJ  wrote:
> 
> Hello I have Oralce Query for Selecting all the columns:-
> 
> SELECT tc.column_name, tc.owner, tc.table_name, tc.column_id, tc.nullable,
> tc.data_type, c.constraint_type, c.r_owner AS reference_owner,
> rcc.table_name AS reference_table, rcc.column_name AS reference_column_name
> FROM SYS.ALL_TAB_COLUMNS tc LEFT OUTER JOIN SYS.ALL_CONS_COLUMNS cc ON (
> tc.owner = cc.owner AND tc.table_name = cc.table_name AND tc.column_name =
> cc.COLUMN_NAME ) LEFT OUTER JOIN SYS.ALL_CONSTRAINTS c ON ( tc.owner =
> c.owner AND tc.table_name = c.table_name AND c.constraint_name =
> cc.constraint_name ) LEFT OUTER JOIN ALL_CONS_COLUMNS rcc ON ( c.r_owner =
> rcc.owner AND c.r_constraint_name = rcc.constraint_name ) WHERE
> tc.table_name = 'REPORTSETTING' AND tc.OWNER = 'NVN' ORDER BY tc.column_id;
> 
> *This query is working fine in Oracle DB, but while using same query in
> Drill, it giving error. Query for Drill is:-*
> 
> SELECT tc.column_name, tc.owner, tc.table_name,
> tc.column_id,tc.nullable,tc.data_type,c.constraint_type,c.r_owner AS
> reference_owner, rcc.table_name AS reference_table, rcc.column_name AS
> reference_column_name FROM OracleDB.SYS.ALL_TAB_COLUMNS tc LEFT OUTER JOIN
> OracleDB.SYS.ALL_CONS_COLUMNS cc ON ( tc.owner = cc.owner AND tc.table_name
> = cc.table_name AND tc.column_name = cc.COLUMN_NAME ) LEFT OUTER JOIN
> OracleDB.SYS.ALL_CONSTRAINTS c ON ( tc.owner = c.owner AND tc.table_name =
> c.table_name AND c.constraint_name = cc.constraint_name ) LEFT OUTER JOIN
> OracleDB.SYS.ALL_CONS_COLUMNS rcc ON ( c.r_owner = rcc.owner AND
> c.r_constraint_name = rcc.constraint_name ) WHERE tc.table_name =
> 'REPORTSETTING' AND tc.OWNER = 'NVN' ORDER BY tc.column_id ASC;
> 
> Following Error Showing:-
> 
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
> The JDBC storage plugin failed while trying setup the SQL query. sql SELECT
> * FROM (SELECT "t1"."OWNER", "t1"."TABLE_NAME", "t1"."COLUMN_NAME",
> "t1"."DATA_TYPE", "t1"."NULLABLE", "t1"."COLUMN_ID",
> "ALL_CONSTRAINTS"."CONSTRAINT_TYPE", "ALL_CONSTRAINTS"."R_OWNER",
> "ALL_CONSTRAINTS"."R_CONSTRAINT_NAME" FROM (SELECT "t0"."OWNER",
> "t0"."TABLE_NAME", "t0"."COLUMN_NAME", "t0"."DATA_TYPE",
> "t0"."DATA_TYPE_MOD", "t0"."DATA_TYPE_OWNER", "t0"."DATA_LENGTH",
> "t0"."DATA_PRECISION", "t0"."DATA_SCALE", "t0"."NULLABLE",
> "t0"."COLUMN_ID", "t0"."DEFAULT_LENGTH", "t0"."DATA_DEFAULT",
> "t0"."NUM_DISTINCT", "t0"."LOW_VALUE", "t0"."HIGH_VALUE", "t0"."DENSITY",
> "t0"."NUM_NULLS", "t0"."NUM_BUCKETS", "t0"."LAST_ANALYZED",
> "t0"."SAMPLE_SIZE", "t0"."CHARACTER_SET_NAME", "t0"."CHAR_COL_DECL_LENGTH",
> "t0"."GLOBAL_STATS", "t0"."USER_STATS", "t0"."AVG_COL_LEN",
> "t0"."CHAR_LENGTH", "t0"."CHAR_USED", "t0"."V80_FMT_IMAGE",
> "t0"."DATA_UPGRADED", "t0"."HISTOGRAM", "ALL_CONS_COLUMNS"."OWNER"
> "OWNER0", "ALL_CONS_COLUMNS"."CONSTRAINT_NAME",
> "ALL_CONS_COLUMNS"."TABLE_NAME" "TABLE_NAME0",
> "ALL_CONS_COLUMNS"."COLUMN_NAME" "COLUMN_NAME0",
> "ALL_CONS_COLUMNS"."POSITION", CAST("t0"."OWNER" AS VARCHAR(120) CHARACTER
> SET "ISO-8859-1") "$f36" FROM (SELECT "OWNER", "TABLE_NAME", "COLUMN_NAME",
> "DATA_TYPE", "DATA_TYPE_MOD", "DATA_TYPE_OWNER", "DATA_LENGTH",
> "DATA_PRECISION", "DATA_SCALE", "NULLABLE", "COLUMN_ID", "DEFAULT_LENGTH",
> "DATA_DEFAULT", "NUM_DISTINCT", "LOW_VALUE", "HIGH_VALUE", "DENSITY",
> "NUM_NULLS", "NUM_BUCKETS", "LAST_ANALYZED", "SAMPLE_SIZE",
> "CHARACTER_SET_NAME", "CHAR_COL_DECL_LENGTH", "GLOBAL_STATS", "USER_STATS",
> "AVG_COL_LEN", "CHAR_LENGTH", "CHAR_USED", "V80_FMT_IMAGE",
> "DATA_UPGRADED", "HISTOGRAM", CAST("COLUMN_NAME" AS VARCHAR(4000) CHARACTER
> SET "ISO-8859-1") "$f31" FROM "SYS"."ALL_TAB_COLUMNS" WHERE "TABLE_NAME" =
> 'REPORTSETTING' AND "OWNER" = 'NVN') "t0" LEFT JOIN
> "SYS"."ALL_CONS_COLUMNS" ON "t0"."OWNER" = "ALL_CONS_COLUMNS"."OWNER" AND
> "t0"."TABLE_NAME" = "ALL_CONS_COLUMNS"."TABLE_NAME" AND "t0"."$f31" =
> "ALL_CONS_COLUMNS"."COLUMN_NAME") "t1" LEFT JOIN "SYS"."ALL_CONSTRAINTS" ON
> "t1"."$f36" = "ALL_CONSTRAINTS"."OWNER" AND "t1"."TABLE_NAME" =
> "ALL_CONSTRAINTS"."TABLE_NAME" AND "t1"."CONSTRAINT_NAME" =
> "ALL_CONSTRAINTS"."CONSTRAINT_NAME") "t2" LEFT JOIN (SELECT
> "CONSTRAINT_NAME", "TABLE_NAME", "COLUMN_NAME", CAST("OWNER" AS
> VARCHAR(120) CHARACTER SET "ISO-8859-1") "$f5" FROM
> "SYS"."ALL_CONS_COLUMNS") "t3" ON "t2"."R_OWNER" = "t3"."$f5" AND
> "t2"."R_CONSTRAINT_NAME" = "t3"."CONSTRAINT_NAME" plugin OracleDB Fragment
> 0:0 [Error Id: 2a11fed2-ec79-4ef1-9d29-781af21274f6
> 
> *Please Tell me what i am doing wrong in this query?*
> 
> -- 
> Thanks & Regards.
> Sanjiv
> ​Swaraj​



Do i need hadoop installed to use dfs storage?

2016-05-31 Thread Scott Kinney

I'm trying to test running drill on gz json files in s3 and i keep getting.


VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Table 
's3_file.gz' not found


I downloaded the file and unzipped it, setup a new storage plugin to point to 
the local file


{
  "type": "file",
  "enabled": true,
  "connection": "file:///tmp/data/",
  "config": null,
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": "json"
},
...

And i get the same thing.
I do not have hadoop installed. do i need it?

Thanks,




Scott Kinney | DevOps
stem    |   m  510.282.1299
100 Rollins Road, Millbrae, California 94030

This e-mail and/or any attachments contain Stem, Inc. confidential and 
proprietary information and material for the sole use of the intended 
recipient(s). Any review, use or distribution that has not been expressly 
authorized by Stem, Inc. is strictly prohibited. If you are not the intended 
recipient, please contact the sender and delete all copies. Thank you.


Re: Profiles Gone in Web UI: The great profile heist

2016-05-31 Thread Abdel Hakim Deneche
are you storing the profiles in a local folder or in nfs ?

On Tue, May 31, 2016 at 12:49 PM, John Omernik  wrote:

> I am scratching my head at this one... I made some minor changes to my
> drill-env.sh to enable gclogging, and was using the profiles in the webui
> just fine.  Due to some previously mentioned issues, I've had to restart
> drill bits due to GC issues etc.
>
> Now, while my profiles directory still exists, and my drill-override.conf
> has not been changed, no profiles now show up in the webui, even after
> drillbit restarts, and running more queries... The profiles are still being
> created (I can see them being added to the same profiles directory) just
> nothing shows up in the Web UI...
>
> What could be happening here?
>
> *scratching my head
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Do i need hadoop installed to use dfs storage?

2016-05-31 Thread Nathan Griffith
Hi Scott,

You definitely don't need to have Hadoop to query your local file system.
Could you list the exact command to Drill that gave you this error?

Best,

Nathan Griffith
Technical Writer
Dremio

On Tue, May 31, 2016 at 1:26 PM, Scott Kinney  wrote:

>
> I'm trying to test running drill on gz json files in s3 and i keep getting.
>
>
> VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Table
> 's3_file.gz' not found
>
>
> I downloaded the file and unzipped it, setup a new storage plugin to point
> to the local file
>
>
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "file:///tmp/data/",
>   "config": null,
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": "json"
> },
> ...
>
> And i get the same thing.
> I do not have hadoop installed. do i need it?
>
> Thanks,
>
>
>
> 
> Scott Kinney | DevOps
> stem    |   m  510.282.1299
> 100 Rollins Road, Millbrae, California 94030
>
> This e-mail and/or any attachments contain Stem, Inc. confidential and
> proprietary information and material for the sole use of the intended
> recipient(s). Any review, use or distribution that has not been expressly
> authorized by Stem, Inc. is strictly prohibited. If you are not the
> intended recipient, please contact the sender and delete all copies. Thank
> you.
>


Re: Profiles Gone in Web UI: The great profile heist

2016-05-31 Thread Jacques Nadeau
Odds are one of your profiles is corrupt. Last I checked, there was a bug
that a corrupt profile would cause no profiles to show up in list. Check if
any of your profiles are zero bytes and delete them.

The UI should really handle this.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, May 31, 2016 at 12:49 PM, John Omernik  wrote:

> I am scratching my head at this one... I made some minor changes to my
> drill-env.sh to enable gclogging, and was using the profiles in the webui
> just fine.  Due to some previously mentioned issues, I've had to restart
> drill bits due to GC issues etc.
>
> Now, while my profiles directory still exists, and my drill-override.conf
> has not been changed, no profiles now show up in the webui, even after
> drillbit restarts, and running more queries... The profiles are still being
> created (I can see them being added to the same profiles directory) just
> nothing shows up in the Web UI...
>
> What could be happening here?
>
> *scratching my head
>


Re: Partition reading problem (like operator) while using hive partition table in drill

2016-05-31 Thread Shankar Mane
I didn't get any response or updates on this jira ticket ( DRILL-4665).

Does anyone looking into this?
On 11 May 2016 03:31, "Aman Sinha"  wrote:

> The Drill test team was able to repro this and is now filed as:
> https://issues.apache.org/jira/browse/DRILL-4665
>
> On Tue, May 10, 2016 at 8:16 AM, Aman Sinha  wrote:
>
> > This is supposed to work, especially since LIKE predicate is not even on
> > the partitioning column (it should work either way).  I did a quick test
> > with file system tables and it works for LIKE conditions.  Not sure yet
> > about Hive tables.  Could you pls file a JIRA and we'll follow up.
> > Thanks.
> >
> > -Aman
> >
> > On Tue, May 10, 2016 at 1:09 AM, Shankar Mane <
> shankar.m...@games24x7.com>
> > wrote:
> >
> >> Problem:
> >>
> >> 1. In drill, we are using hive partition table. But explain plan (same
> >> query) for like and = operator differs and used all partitions in case
> of
> >> like operator.
> >> 2. If you see below drill explain plans: Like operator uses *all*
> >> partitions where
> >> = operator uses *only* partition filtered by log_date condition.
> >>
> >> FYI- We are storing our logs in hive partition table (parquet,
> >> gz-compressed). Each partition is having ~15 GB data. Below is the
> >> describe
> >> statement output from hive:
> >>
> >>
> >> / Hive
> >>
> >>
> /
> >> hive> desc hive_kafkalogs_daily ;
> >> OK
> >> col_name data_type comment
> >> sessionid   string
> >> ajaxurl string
> >>
> >> log_date string
> >>
> >> # Partition Information
> >> # col_name data_type   comment
> >>
> >> log_date string
> >>
> >>
> >>
> >>
> >> /* Drill
> >> Plan (query with LIKE)
> >>
> >>
> ***/
> >>
> >> explain plan for select sessionid, servertime, ajaxUrl from
> >> hive.hive_kafkalogs_daily where log_date = '2016-05-09' and ajaxUrl like
> >> '%utm_source%' limit 1 ;
> >>
> >> +--+--+
> >> | text | json |
> >> +--+--+
> >> | 00-00Screen
> >> 00-01  Project(sessionid=[$0], servertime=[$1], ajaxUrl=[$2])
> >> 00-02SelectionVectorRemover
> >> 00-03  Limit(fetch=[1])
> >> 00-04UnionExchange
> >> 01-01  SelectionVectorRemover
> >> 01-02Limit(fetch=[1])
> >> 01-03  Project(sessionid=[$0], servertime=[$1],
> >> ajaxUrl=[$2])
> >> 01-04SelectionVectorRemover
> >> 01-05  Filter(condition=[AND(=($3, '2016-05-09'),
> >> LIKE($2, '%utm_source%'))])
> >> 01-06Scan(groupscan=[HiveScan
> >> [table=Table(dbName:default, tableName:hive_kafkalogs_daily),
> >> columns=[`sessionid`, `servertime`, `ajaxurl`, `log_date`],
> >> numPartitions=29, partitions= [Partition(values:[2016-04-11]),
> >> Partition(values:[2016-04-12]), Partition(values:[2016-04-13]),
> >> Partition(values:[2016-04-14]), Partition(values:[2016-04-15]),
> >> Partition(values:[2016-04-16]), Partition(values:[2016-04-17]),
> >> Partition(values:[2016-04-18]), Partition(values:[2016-04-19]),
> >> Partition(values:[2016-04-20]), Partition(values:[2016-04-21]),
> >> Partition(values:[2016-04-22]), Partition(values:[2016-04-23]),
> >> Partition(values:[2016-04-24]), Partition(values:[2016-04-25]),
> >> Partition(values:[2016-04-26]), Partition(values:[2016-04-27]),
> >> Partition(values:[2016-04-28]), Partition(values:[2016-04-29]),
> >> Partition(values:[2016-04-30]), Partition(values:[2016-05-01]),
> >> Partition(values:[2016-05-02]), Partition(values:[2016-05-03]),
> >> Partition(values:[2016-05-04]), Partition(values:[2016-05-05]),
> >> Partition(values:[2016-05-06]), Partition(values:[2016-05-07]),
> >> Partition(values:[2016-05-08]), Partition(values:[2016-05-09])],
> >>
> >>
> inputDirectories=[hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160411,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160412,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160413,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160414,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160415,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160416,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160417,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160418,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160419,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160420,
> >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160421,
> >> 

HiveMetastore HA with Drill

2016-05-31 Thread Veera Naranammalpuram
Anyone has any insights into how the Hive storage plug-in can handle Hive
MetaStore HA? The Hive storage plug-in has only one property for
hive.metastore.uris and it takes only one IP:port. When I add a second one,
the update of the storage plug-in fails.

  "configProps": {
"hive.metastore.uris": "thrift://:9083"
  }

How can we give 2 IP's to Drill so it knows to try the second IP if its not
able to talk to the first one?

Thanks in advance.

-- 
Veera Naranammalpuram
Product Specialist - SQL on Hadoop
*MapR Technologies (www.mapr.com )*
*(Email) vnaranammalpu...@maprtech.com *
*(Mobile) 917 683 8116 - can text *
*Timezone: ET (UTC -5:00 / -4:00)*


Reading GC Logs

2016-05-31 Thread John Omernik
I am looking at the GC logs for some big queries (now that I know how to
enabled them, thanks Paul!)  and found the item below, it worries, "it
says, failure, it too 8 seconds) Should I be worried about that? I know my
HEAP is set fairly high (24GB) how should I interpret this?

John


1924.138: [GC concurrent-root-region-scan-end, 0.0019536 secs]

1924.138: [GC concurrent-mark-start]

1924.505: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0314046
secs]

1924.853: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0466492
secs]

1925.110: [GC concurrent-mark-end, 0.9724279 secs]

1925.126: [GC remark, 0.0403063 secs]

1925.182: [GC cleanup 22G->22G(24G), 0.0117316 secs]

1925.306: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0434484
secs]

1925.623: [GC pause (G1 Evacuation Pause) (mixed)-- 22G->23G(24G),
0.2346237 secs]

1926.001: [GC pause (G1 Evacuation Pause) (mixed)-- 23G->23G(24G),
0.6835194 secs]

1926.702: [GC pause (G1 Evacuation Pause) (young) 23G->23G(24G), 0.0388452
secs]

1926.756: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
23G->23G(24G), 0.0265543 secs]

1926.783: [GC concurrent-root-region-scan-start]

1926.783: [GC concurrent-root-region-scan-end, 0.162 secs]

1926.783: [GC concurrent-mark-start]

1926.798: [GC pause (G1 Evacuation Pause) (young) 23G->23G(24G), 0.0397901
secs]

1926.850: [GC pause (G1 Evacuation Pause) (young) 23G->23G(24G), 0.0370946
secs]

1926.902: [Full GC (Allocation Failure)  23G->18G(24G), 8.3348025 secs]

1935.243: [GC concurrent-mark-abort]

1935.967: [GC pause (G1 Evacuation Pause) (young) 20G->18G(24G), 0.0479378
secs]

1936.733: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
20G->18G(24G), 0.0556650 secs]

1936.789: [GC concurrent-root-region-scan-start]

1936.794: [GC concurrent-root-region-scan-end, 0.0049816 secs]

1936.794: [GC concurrent-mark-start]

1937.526: [GC pause (G1 Evacuation Pause) (young) 20G->18G(24G), 0.0528627
secs]

1937.793: [GC concurrent-mark-end, 0.9993197 secs]

1937.811: [GC remark, 0.0503934 secs]

1937.878: [GC cleanup 19G->19G(24G), 0.0124116 secs]

1938.225: [GC pause (G1 Evacuation Pause) (young) 20G->18G(24G), 0.0500627
secs]

1938.827: [GC pause (G1 Evacuation Pause) (young) 20G->18G(24G), 0.0434986
secs]

1939.352: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
20G->18G(24G), 0.0595028 secs]

1939.412: [GC concurrent-root-region-scan-start]

1939.415: [GC concurrent-root-region-scan-end, 0.0038086 secs]

1939.415: [GC concurrent-mark-start]

1939.911: [GC pause (G1 Evacuation Pause) (young) 20G->19G(24G), 0.0394493
secs]


Elasticsearch

2016-05-31 Thread Villagra Laso, Jesus Carlos
Hi


I am going to get data from apache drill to do a simple queries, I would like 
to use it in elasticsearch. And now I don't see where I do that with apache 
drill.

Have you thought any version for do that?


Thanks


[cid:image001.jpg@01D133FA.7A806350]
Jesús C. Villagrá Laso
Gerente de Innovación
Sector Industria
Miguel Yuste 45- Madrid 28037
Tel. Fijo: (+34) 91 325 33 00
Tel. Móvil / Fax: (+34) 620 6712 63
email: jesus.villa...@tecnocom.es
http://www.tecnocom.es

Por favor, antes de imprimir este mensaje, asegúrate de que es necesario. 
Ayudemos a cuidar el medio ambiente
Este mensaje puede contener información confidencial o privilegiada. Si le ha 
llegado por error, rogamos no haga uso del mismo, avise al remitente y bórrelo. 
Consulte aviso legal
This message may contain confidential or privileged information. If it has been 
sent to you in error, please do not use it, notify the sender of the error and 
delete it. See legal 
notice



Re: Elasticsearch

2016-05-31 Thread Vince Gonzalez
The way I read your question, you might be wanting to use Drill to query 
Elasticsearch, or are you might be trying to index Drill query results with 
Elasticsearch.

If the first, this project looks cool: https://github.com/Anchormen/sql4es 


You could use Drill + JDBC plugin + sql4es JDBC driver to query ES. I have not 
tried this myself.

If the second, you could CTAS to JSON 
, then ingest the 
JSON into ES using your favorite method - e.g., logstash or something like that.

HTH

--vince


> On May 31, 2016, at 6:48 AM, Villagra Laso, Jesus Carlos 
>  wrote:
> 
> Hi
>  
>  
> I am going to get data from apache drill to do a simple queries, I would like 
> to use it in elasticsearch. And now I don’t see where I do that with apache 
> drill.
>  
> Have you thought any version for do that?
>  
>  
> Thanks
>  
>  
> 
> Jesús C. Villagrá Laso
> Gerente de Innovación
> Sector Industria
> Miguel Yuste 45- Madrid 28037
> Tel. Fijo: (+34) 91 325 33 00
> Tel. Móvil / Fax: (+34) 620 6712 63
> email: jesus.villa...@tecnocom.es 
> http://www.tecnocom.es 
>  
> Por favor, antes de imprimir este mensaje, asegúrate de que es necesario. 
> Ayudemos a cuidar el medio ambiente
> Este mensaje puede contener información confidencial o privilegiada. Si le ha 
> llegado por error, rogamos no haga uso del mismo, avise al remitente y 
> bórrelo. Consulte aviso legal 
> 
> This message may contain confidential or privileged information. If it has 
> been sent to you in error, please do not use it, notify the sender of the 
> error and delete it. See legal notice 
> 


Re: Reading GC Logs

2016-05-31 Thread John Omernik
Also: Doing a CTAS using the new reader and dictionary encoding is
producing this, everything is hung at this point. The query in sqlline is
not returning, the web UI is running extremely slowly, and when it does
return, shows the running query, however, when I click on it, the profile
shows an error saying profile not found.  The Full GCs are happening quite
a bit, and take a long time (>10 seconds) And (this is my tailed gcclog,
it's actually writing part of the the "allocation error" message and then
waits a before anything else happens. This is "the scary" state my cluster
can get into, and I am trying to avoid this :) Any tips on what may be
happening here would be appreciated.

(24 GB of Heap, 5 nodes at this point)





912.895: [Full GC (Allocation Failure)  23G->20G(24G), 11.7923015 secs]

2924.692: [GC concurrent-mark-abort]

2925.099: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0540177
secs]

2925.401: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
22G->21G(24G), 0.0638409 secs]

2925.465: [GC concurrent-root-region-scan-start]

2925.475: [GC concurrent-root-region-scan-end, 0.0097528 secs]

2925.475: [GC concurrent-mark-start]

2925.846: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0454322
secs]

2926.252: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0543209
secs]

2926.604: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0525408
secs]

2926.986: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0534530
secs]

2927.389: [GC concurrent-mark-end, 1.9133249 secs]

2927.405: [GC remark, 0.0446448 secs]

2927.462: [GC cleanup 22G->22G(24G), 0.0290235 secs]

2927.494: [GC concurrent-cleanup-start]

2927.494: [GC concurrent-cleanup-end, 0.190 secs]

2927.530: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0500267
secs]

2927.828: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0462845
secs]

2928.184: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
22G->21G(24G), 0.0749704 secs]

2928.259: [GC concurrent-root-region-scan-start]

2928.268: [GC concurrent-root-region-scan-end, 0.0093531 secs]

2928.268: [GC concurrent-mark-start]

2928.568: [GC pause (G1 Evacuation Pause) (young) 22G->22G(24G), 0.0555025
secs]

2928.952: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G), 0.0489993
secs]

2929.333: [GC pause (G1 Evacuation Pause) (young)-- 23G->22G(24G),
0.0676159 secs]

2929.693: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
0.2088768 secs]

2929.914: [Full GC (Allocation Failure)  23G->20G(24G), 11.6264600 secs]

2941.544: [GC concurrent-mark-abort]

2941.836: [GC pause (G1 Evacuation Pause) (young) 22G->20G(24G), 0.0416962
secs]

2942.127: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
22G->21G(24G), 0.0627406 secs]

2942.190: [GC concurrent-root-region-scan-start]

2942.193: [GC concurrent-root-region-scan-end, 0.0029795 secs]

2942.193: [GC concurrent-mark-start]

2942.548: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0591030
secs]

2942.934: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0589163
secs]

2943.304: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0459117
secs]

2943.743: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0461640
secs]

2943.941: [GC concurrent-mark-end, 1.7476855 secs]

2943.953: [GC remark, 0.0356995 secs]

2944.000: [GC cleanup 22G->22G(24G), 0.0307393 secs]

2944.034: [GC concurrent-cleanup-start]

2944.034: [GC concurrent-cleanup-end, 0.281 secs]

2944.162: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0558067
secs]

2944.510: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0497960
secs]

2944.837: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
22G->21G(24G), 0.0719856 secs]

2944.909: [GC concurrent-root-region-scan-start]

2944.917: [GC concurrent-root-region-scan-end, 0.0076375 secs]

2944.917: [GC concurrent-mark-start]

2945.204: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0476954
secs]

2945.604: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0438138
secs]

2945.940: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G), 0.0554554
secs]

2946.358: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G), 0.0502923
secs]

2946.708: [GC pause (G1 Evacuation Pause) (young)-- 23G->22G(24G),
0.0728342 secs]

2947.021: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
0.1938188 secs]

2947.227: [Full GC (Allocation Failure)

On Tue, May 31, 2016 at 10:20 AM, John Omernik  wrote:

> I am looking at the GC logs for some big queries (now that I know how to
> enabled them, thanks Paul!)  and found the item below, it worries, "it
> says, failure, it too 8 seconds) Should I be worried about that? I know my
> HEAP is set fairly high (24GB) how should I interpret this?
>
> John
>
>
> 1924.138: [GC concurrent-root-region-scan-end, 0.0019536 secs]
>
> 1924.138: [GC concurrent-mark-start]
>
> 1924.505: [GC pause (G1 Evacuation Pause) (young) 

Re: Reading GC Logs

2016-05-31 Thread Abdel Hakim Deneche
My understanding (which is incomplete) is that both the "new reader" and
"dictionary encoding" are not stable yet and can cause failures or worse,
incorrect data. That's why they are disabled by default.

The "Allocation Failure" means that the JVM had to run a Full GC because it
couldn't allocate more heap for Drill. Looks like Drill is using more that
24GB of heap, which is most likely a bug.

What happens if you run the select part of the CTAS, does it also use too
much heap ?


On Tue, May 31, 2016 at 8:54 AM, John Omernik  wrote:

> Oh, the query just stopped showing up in the profiles webui, completely
> gone like it never happened. Seems to be responding a bit better, the
> sqlline is still hung though.
>
> (Yes this is all related to my CTAS of the parquet data, at this point I am
> just looking for ways to handle the data and not make drill really unhappy.
> )
>
> On Tue, May 31, 2016 at 10:51 AM, John Omernik  wrote:
>
> > Also: Doing a CTAS using the new reader and dictionary encoding is
> > producing this, everything is hung at this point. The query in sqlline is
> > not returning, the web UI is running extremely slowly, and when it does
> > return, shows the running query, however, when I click on it, the profile
> > shows an error saying profile not found.  The Full GCs are happening
> quite
> > a bit, and take a long time (>10 seconds) And (this is my tailed gcclog,
> > it's actually writing part of the the "allocation error" message and then
> > waits a before anything else happens. This is "the scary" state my
> cluster
> > can get into, and I am trying to avoid this :) Any tips on what may be
> > happening here would be appreciated.
> >
> > (24 GB of Heap, 5 nodes at this point)
> >
> >
> >
> >
> >
> > 912.895: [Full GC (Allocation Failure)  23G->20G(24G), 11.7923015 secs]
> >
> > 2924.692: [GC concurrent-mark-abort]
> >
> > 2925.099: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0540177
> > secs]
> >
> > 2925.401: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> > 22G->21G(24G), 0.0638409 secs]
> >
> > 2925.465: [GC concurrent-root-region-scan-start]
> >
> > 2925.475: [GC concurrent-root-region-scan-end, 0.0097528 secs]
> >
> > 2925.475: [GC concurrent-mark-start]
> >
> > 2925.846: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0454322
> > secs]
> >
> > 2926.252: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0543209
> > secs]
> >
> > 2926.604: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0525408
> > secs]
> >
> > 2926.986: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0534530
> > secs]
> >
> > 2927.389: [GC concurrent-mark-end, 1.9133249 secs]
> >
> > 2927.405: [GC remark, 0.0446448 secs]
> >
> > 2927.462: [GC cleanup 22G->22G(24G), 0.0290235 secs]
> >
> > 2927.494: [GC concurrent-cleanup-start]
> >
> > 2927.494: [GC concurrent-cleanup-end, 0.190 secs]
> >
> > 2927.530: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0500267
> > secs]
> >
> > 2927.828: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0462845
> > secs]
> >
> > 2928.184: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> > 22G->21G(24G), 0.0749704 secs]
> >
> > 2928.259: [GC concurrent-root-region-scan-start]
> >
> > 2928.268: [GC concurrent-root-region-scan-end, 0.0093531 secs]
> >
> > 2928.268: [GC concurrent-mark-start]
> >
> > 2928.568: [GC pause (G1 Evacuation Pause) (young) 22G->22G(24G),
> 0.0555025
> > secs]
> >
> > 2928.952: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G),
> 0.0489993
> > secs]
> >
> > 2929.333: [GC pause (G1 Evacuation Pause) (young)-- 23G->22G(24G),
> > 0.0676159 secs]
> >
> > 2929.693: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
> > 0.2088768 secs]
> >
> > 2929.914: [Full GC (Allocation Failure)  23G->20G(24G), 11.6264600 secs]
> >
> > 2941.544: [GC concurrent-mark-abort]
> >
> > 2941.836: [GC pause (G1 Evacuation Pause) (young) 22G->20G(24G),
> 0.0416962
> > secs]
> >
> > 2942.127: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> > 22G->21G(24G), 0.0627406 secs]
> >
> > 2942.190: [GC concurrent-root-region-scan-start]
> >
> > 2942.193: [GC concurrent-root-region-scan-end, 0.0029795 secs]
> >
> > 2942.193: [GC concurrent-mark-start]
> >
> > 2942.548: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0591030
> > secs]
> >
> > 2942.934: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0589163
> > secs]
> >
> > 2943.304: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0459117
> > secs]
> >
> > 2943.743: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> 0.0461640
> > secs]
> >
> > 2943.941: [GC concurrent-mark-end, 1.7476855 secs]
> >
> > 2943.953: [GC remark, 0.0356995 secs]
> >
> > 2944.000: [GC cleanup 22G->22G(24G), 0.0307393 secs]
> >
> > 2944.034: [GC concurrent-cleanup-start]
> >
> > 2944.034: [GC concurrent-cleanup-end, 0.281 secs]
> >
> > 2944.162: [GC pause (G1 Evacuation Pause) 

Re: Reading GC Logs

2016-05-31 Thread John Omernik
Oh, the query just stopped showing up in the profiles webui, completely
gone like it never happened. Seems to be responding a bit better, the
sqlline is still hung though.

(Yes this is all related to my CTAS of the parquet data, at this point I am
just looking for ways to handle the data and not make drill really unhappy.
)

On Tue, May 31, 2016 at 10:51 AM, John Omernik  wrote:

> Also: Doing a CTAS using the new reader and dictionary encoding is
> producing this, everything is hung at this point. The query in sqlline is
> not returning, the web UI is running extremely slowly, and when it does
> return, shows the running query, however, when I click on it, the profile
> shows an error saying profile not found.  The Full GCs are happening quite
> a bit, and take a long time (>10 seconds) And (this is my tailed gcclog,
> it's actually writing part of the the "allocation error" message and then
> waits a before anything else happens. This is "the scary" state my cluster
> can get into, and I am trying to avoid this :) Any tips on what may be
> happening here would be appreciated.
>
> (24 GB of Heap, 5 nodes at this point)
>
>
>
>
>
> 912.895: [Full GC (Allocation Failure)  23G->20G(24G), 11.7923015 secs]
>
> 2924.692: [GC concurrent-mark-abort]
>
> 2925.099: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0540177
> secs]
>
> 2925.401: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> 22G->21G(24G), 0.0638409 secs]
>
> 2925.465: [GC concurrent-root-region-scan-start]
>
> 2925.475: [GC concurrent-root-region-scan-end, 0.0097528 secs]
>
> 2925.475: [GC concurrent-mark-start]
>
> 2925.846: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0454322
> secs]
>
> 2926.252: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0543209
> secs]
>
> 2926.604: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0525408
> secs]
>
> 2926.986: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0534530
> secs]
>
> 2927.389: [GC concurrent-mark-end, 1.9133249 secs]
>
> 2927.405: [GC remark, 0.0446448 secs]
>
> 2927.462: [GC cleanup 22G->22G(24G), 0.0290235 secs]
>
> 2927.494: [GC concurrent-cleanup-start]
>
> 2927.494: [GC concurrent-cleanup-end, 0.190 secs]
>
> 2927.530: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0500267
> secs]
>
> 2927.828: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0462845
> secs]
>
> 2928.184: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> 22G->21G(24G), 0.0749704 secs]
>
> 2928.259: [GC concurrent-root-region-scan-start]
>
> 2928.268: [GC concurrent-root-region-scan-end, 0.0093531 secs]
>
> 2928.268: [GC concurrent-mark-start]
>
> 2928.568: [GC pause (G1 Evacuation Pause) (young) 22G->22G(24G), 0.0555025
> secs]
>
> 2928.952: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G), 0.0489993
> secs]
>
> 2929.333: [GC pause (G1 Evacuation Pause) (young)-- 23G->22G(24G),
> 0.0676159 secs]
>
> 2929.693: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
> 0.2088768 secs]
>
> 2929.914: [Full GC (Allocation Failure)  23G->20G(24G), 11.6264600 secs]
>
> 2941.544: [GC concurrent-mark-abort]
>
> 2941.836: [GC pause (G1 Evacuation Pause) (young) 22G->20G(24G), 0.0416962
> secs]
>
> 2942.127: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> 22G->21G(24G), 0.0627406 secs]
>
> 2942.190: [GC concurrent-root-region-scan-start]
>
> 2942.193: [GC concurrent-root-region-scan-end, 0.0029795 secs]
>
> 2942.193: [GC concurrent-mark-start]
>
> 2942.548: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0591030
> secs]
>
> 2942.934: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0589163
> secs]
>
> 2943.304: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0459117
> secs]
>
> 2943.743: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0461640
> secs]
>
> 2943.941: [GC concurrent-mark-end, 1.7476855 secs]
>
> 2943.953: [GC remark, 0.0356995 secs]
>
> 2944.000: [GC cleanup 22G->22G(24G), 0.0307393 secs]
>
> 2944.034: [GC concurrent-cleanup-start]
>
> 2944.034: [GC concurrent-cleanup-end, 0.281 secs]
>
> 2944.162: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0558067
> secs]
>
> 2944.510: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0497960
> secs]
>
> 2944.837: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> 22G->21G(24G), 0.0719856 secs]
>
> 2944.909: [GC concurrent-root-region-scan-start]
>
> 2944.917: [GC concurrent-root-region-scan-end, 0.0076375 secs]
>
> 2944.917: [GC concurrent-mark-start]
>
> 2945.204: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0476954
> secs]
>
> 2945.604: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G), 0.0438138
> secs]
>
> 2945.940: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G), 0.0554554
> secs]
>
> 2946.358: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G), 0.0502923
> secs]
>
> 2946.708: [GC pause (G1 Evacuation Pause) (young)-- 23G->22G(24G),
> 0.0728342 secs]
>
> 2947.021: [GC pause (G1 

Re: Reading GC Logs

2016-05-31 Thread John Omernik
It's just a flat select (via a view) . basically select field1, field2,
field100 from view_mytable where dir0 = '2016-05-01' there is no
aggregation or anything happening.

As to the dictionary encoding and the new reader some thoughts:

1. Based on what I've read, the new reader is faster for flat data, in my
case, it's the only thing that is allowing me to read the data created in a
CDH cluster with a map reduce job. The "old" reader gives me the array
index out of bounds (see other thread).  So in order to clean up my data,
I'd like to use the new reader here, however, now you have me worried about
incorrect data.

2. The files are already dictionary encoded, when I do the CTAS without the
encoding, the result is the files are quite a bit bigger than the original
files. Not a huge issue, but substantial (10-20 GB per day).   Thats why I
tried to combine the two.

3.  I am now worried about both the encoding/reader for incorrect data...
Are there any JIRA's etc with status on this and warnings on their use?

Thanks!

John


On Tue, May 31, 2016 at 11:02 AM, Abdel Hakim Deneche  wrote:

> My understanding (which is incomplete) is that both the "new reader" and
> "dictionary encoding" are not stable yet and can cause failures or worse,
> incorrect data. That's why they are disabled by default.
>
> The "Allocation Failure" means that the JVM had to run a Full GC because it
> couldn't allocate more heap for Drill. Looks like Drill is using more that
> 24GB of heap, which is most likely a bug.
>
> What happens if you run the select part of the CTAS, does it also use too
> much heap ?
>
>
> On Tue, May 31, 2016 at 8:54 AM, John Omernik  wrote:
>
> > Oh, the query just stopped showing up in the profiles webui, completely
> > gone like it never happened. Seems to be responding a bit better, the
> > sqlline is still hung though.
> >
> > (Yes this is all related to my CTAS of the parquet data, at this point I
> am
> > just looking for ways to handle the data and not make drill really
> unhappy.
> > )
> >
> > On Tue, May 31, 2016 at 10:51 AM, John Omernik  wrote:
> >
> > > Also: Doing a CTAS using the new reader and dictionary encoding is
> > > producing this, everything is hung at this point. The query in sqlline
> is
> > > not returning, the web UI is running extremely slowly, and when it does
> > > return, shows the running query, however, when I click on it, the
> profile
> > > shows an error saying profile not found.  The Full GCs are happening
> > quite
> > > a bit, and take a long time (>10 seconds) And (this is my tailed
> gcclog,
> > > it's actually writing part of the the "allocation error" message and
> then
> > > waits a before anything else happens. This is "the scary" state my
> > cluster
> > > can get into, and I am trying to avoid this :) Any tips on what may be
> > > happening here would be appreciated.
> > >
> > > (24 GB of Heap, 5 nodes at this point)
> > >
> > >
> > >
> > >
> > >
> > > 912.895: [Full GC (Allocation Failure)  23G->20G(24G), 11.7923015 secs]
> > >
> > > 2924.692: [GC concurrent-mark-abort]
> > >
> > > 2925.099: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0540177
> > > secs]
> > >
> > > 2925.401: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> > > 22G->21G(24G), 0.0638409 secs]
> > >
> > > 2925.465: [GC concurrent-root-region-scan-start]
> > >
> > > 2925.475: [GC concurrent-root-region-scan-end, 0.0097528 secs]
> > >
> > > 2925.475: [GC concurrent-mark-start]
> > >
> > > 2925.846: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0454322
> > > secs]
> > >
> > > 2926.252: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0543209
> > > secs]
> > >
> > > 2926.604: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0525408
> > > secs]
> > >
> > > 2926.986: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0534530
> > > secs]
> > >
> > > 2927.389: [GC concurrent-mark-end, 1.9133249 secs]
> > >
> > > 2927.405: [GC remark, 0.0446448 secs]
> > >
> > > 2927.462: [GC cleanup 22G->22G(24G), 0.0290235 secs]
> > >
> > > 2927.494: [GC concurrent-cleanup-start]
> > >
> > > 2927.494: [GC concurrent-cleanup-end, 0.190 secs]
> > >
> > > 2927.530: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0500267
> > > secs]
> > >
> > > 2927.828: [GC pause (G1 Evacuation Pause) (young) 22G->21G(24G),
> > 0.0462845
> > > secs]
> > >
> > > 2928.184: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> > > 22G->21G(24G), 0.0749704 secs]
> > >
> > > 2928.259: [GC concurrent-root-region-scan-start]
> > >
> > > 2928.268: [GC concurrent-root-region-scan-end, 0.0093531 secs]
> > >
> > > 2928.268: [GC concurrent-mark-start]
> > >
> > > 2928.568: [GC pause (G1 Evacuation Pause) (young) 22G->22G(24G),
> > 0.0555025
> > > secs]
> > >
> > > 2928.952: [GC pause (G1 Evacuation Pause) (young) 23G->22G(24G),
> > 0.0489993
> > > secs]
> > >
> > > 2929.333: [GC pause (G1