[jira] [Created] (DRILL-7275) CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should be greater than zero [0]

2019-05-21 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-7275:
-

 Summary: CTAS + CTE query fails with IllegalStateException: Read 
batch count [%d] should be greater than zero [0]
 Key: DRILL-7275
 URL: https://issues.apache.org/jira/browse/DRILL-7275
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: Khurram Faraaz


CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should 
be greater than zero [0]

Precondition check fails on line 47 in VarLenFixedEntryReader.java

{noformat}
44 final int expectedDataLen = columnPrecInfo.precision;
45 final int entrySz = 4 + columnPrecInfo.precision;
46 final int readBatch = getFixedLengthMaxRecordsToRead(valuesToRead, entrySz);
47 Preconditions.checkState(readBatch > 0, "Read batch count [%d] should be 
greater than zero", readBatch);
{noformat}


Stack trace from drillbit.log, also has the failing query.

{noformat}
2019-05-13 14:40:14,090 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
23268c40-ef3a-6349-5901-5762f6888971 issued by scoop_stc: CREATE TABLE 
TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityProxyDailyReturn AS
WITH
che AS (
 SELECT * FROM 
TEST_TEMPLATE_SCHEMA_creid.tbl_c_CompositeHierarchyEntry_TimeVarying
 WHERE CompositeHierarchyName = 'AxiomaRegion/AxiomaSector/VectorUniverse'
 AND state = 'DupesRemoved'
 AND CompositeLevel = 'AxiomaRegion_1/AxiomaSector_1/VectorUniverse_0'
),
ef AS (SELECT * FROM 
TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityDailyReturn_FXAdjusted WHERE Status = 
'PresentInRawData'),
d AS (SELECT * FROM TEST_TEMPLATE_SCHEMA_creid.tbl_r_BusinessDate WHERE 
IsWeekday),
x AS
(
 SELECT
 che.CompositeHierarchyName,
 che.State,
 che.CompositeNodeName,
 d.`Date` AS RecordDate,
 COUNT(che.CompositeNodeName) AS countDistinctConstituents,
 COUNT(ef.VectorListingId) AS countDataPoints,
 AVG(ef.DailyReturn) AS AvgReturn, 
 AVG(ef.DailyReturnUSD) AS AvgReturnUSD,
 AVG(ef.NotionalReturnUSD) AS AvgNotionalReturnUSD
 FROM d
 INNER JOIN che ON d.`Date` BETWEEN che.CompositeUltimateChildStartDate AND 
che.CompositeUltimateChildEndDate
 LEFT OUTER JOIN ef ON d.`Date` = ef.RecordDate AND 'VectorListingId_' || 
CAST(ef.VectorListingId AS VARCHAR(100)) = che.UltimateChild
 GROUP BY che.CompositeHierarchyName, che.State, che.CompositeNodeName, 
d.`Date`, d.IsWeekday, d.IsHoliday
)
SELECT * FROM x
2019-05-13 14:40:16,971 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO 
o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table 
[tbl_c_EquityProxyDailyReturn].
...
...
2019-05-13 14:40:20,036 [23268c40-ef3a-6349-5901-5762f6888971:frag:6:10] INFO 
o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet 
record reader.
Message:
Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet
Total records read: 0
Row group index: 0
Records in row group: 3243
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
 optional int64 VectorListingId;
 optional int32 RecordDate (DATE);
 required binary Status (UTF8);
 required binary CurrencyISO (UTF8);
 optional double DailyReturn;
 optional double DailyReturnUSD;
 optional double NotionalReturnUSD;
}
, metadata: \{drill-writer.version=2, drill.version=1.15.0.0-mapr}}, blocks: 
[BlockMetaData\{3243, 204762 [ColumnMetaData{UNCOMPRESSED [VectorListingId] 
optional int64 VectorListingId [RLE, BIT_PACKED, PLAIN], 4}, 
ColumnMetaData\{UNCOMPRESSED [RecordDate] optional int32 RecordDate (DATE) 
[RLE, BIT_PACKED, PLAIN], 26021}, ColumnMetaData\{UNCOMPRESSED [Status] 
required binary Status (UTF8) [BIT_PACKED, PLAIN], 39050}, 
ColumnMetaData\{UNCOMPRESSED [CurrencyISO] required binary CurrencyISO (UTF8) 
[BIT_PACKED, PLAIN], 103968}, ColumnMetaData\{UNCOMPRESSED [DailyReturn] 
optional double DailyReturn [RLE, BIT_PACKED, PLAIN], 126715}, 
ColumnMetaData\{UNCOMPRESSED [DailyReturnUSD] optional double DailyReturnUSD 
[RLE, BIT_PACKED, PLAIN], 152732}, ColumnMetaData\{UNCOMPRESSED 
[NotionalReturnUSD] optional double NotionalReturnUSD [RLE, BIT_PACKED, PLAIN], 
178749}]}]} (Error in parquet record reader.
...
...
Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet
Total records read: 0
Row group index: 0
Records in row group: 3243
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
 optional int64 VectorListingId;
 optional int32 RecordDate (DATE);
 required binary Status (UTF8);
 required binary CurrencyISO (UTF8);
 optional double DailyReturn;
 optional double DailyReturnUSD;
 optional double NotionalReturnUSD;
}
, metadata: \{drill-writer.version=2, drill.version=1.15.0.0-mapr}}, blocks: 
[BlockMetaData\{3243, 204762 [ColumnMetaData{UNCOMPRESSED [VectorListingId] 
optional int64 VectorListingId [RLE, BIT_PACKED, PLAIN], 4}, 
ColumnMetaData\{UNCOMPRESSED [RecordDate] optional int32 RecordDate (DATE) 
[RLE, 

[jira] [Assigned] (DRILL-7275) CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should be greater than zero [0]

2019-05-21 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz reassigned DRILL-7275:
-

Assignee: salim achouche

> CTAS + CTE query fails with IllegalStateException: Read batch count [%d] 
> should be greater than zero [0]
> 
>
> Key: DRILL-7275
> URL: https://issues.apache.org/jira/browse/DRILL-7275
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.15.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Major
>
> CTAS + CTE query fails with IllegalStateException: Read batch count [%d] 
> should be greater than zero [0]
> Precondition check fails on line 47 in VarLenFixedEntryReader.java
> {noformat}
> 44 final int expectedDataLen = columnPrecInfo.precision;
> 45 final int entrySz = 4 + columnPrecInfo.precision;
> 46 final int readBatch = getFixedLengthMaxRecordsToRead(valuesToRead, 
> entrySz);
> 47 Preconditions.checkState(readBatch > 0, "Read batch count [%d] should be 
> greater than zero", readBatch);
> {noformat}
> Stack trace from drillbit.log, also has the failing query.
> {noformat}
> 2019-05-13 14:40:14,090 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO 
> o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
> 23268c40-ef3a-6349-5901-5762f6888971 issued by scoop_stc: CREATE TABLE 
> TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityProxyDailyReturn AS
> WITH
> che AS (
>  SELECT * FROM 
> TEST_TEMPLATE_SCHEMA_creid.tbl_c_CompositeHierarchyEntry_TimeVarying
>  WHERE CompositeHierarchyName = 'AxiomaRegion/AxiomaSector/VectorUniverse'
>  AND state = 'DupesRemoved'
>  AND CompositeLevel = 'AxiomaRegion_1/AxiomaSector_1/VectorUniverse_0'
> ),
> ef AS (SELECT * FROM 
> TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityDailyReturn_FXAdjusted WHERE Status = 
> 'PresentInRawData'),
> d AS (SELECT * FROM TEST_TEMPLATE_SCHEMA_creid.tbl_r_BusinessDate WHERE 
> IsWeekday),
> x AS
> (
>  SELECT
>  che.CompositeHierarchyName,
>  che.State,
>  che.CompositeNodeName,
>  d.`Date` AS RecordDate,
>  COUNT(che.CompositeNodeName) AS countDistinctConstituents,
>  COUNT(ef.VectorListingId) AS countDataPoints,
>  AVG(ef.DailyReturn) AS AvgReturn, 
>  AVG(ef.DailyReturnUSD) AS AvgReturnUSD,
>  AVG(ef.NotionalReturnUSD) AS AvgNotionalReturnUSD
>  FROM d
>  INNER JOIN che ON d.`Date` BETWEEN che.CompositeUltimateChildStartDate AND 
> che.CompositeUltimateChildEndDate
>  LEFT OUTER JOIN ef ON d.`Date` = ef.RecordDate AND 'VectorListingId_' || 
> CAST(ef.VectorListingId AS VARCHAR(100)) = che.UltimateChild
>  GROUP BY che.CompositeHierarchyName, che.State, che.CompositeNodeName, 
> d.`Date`, d.IsWeekday, d.IsHoliday
> )
> SELECT * FROM x
> 2019-05-13 14:40:16,971 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO 
> o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table 
> [tbl_c_EquityProxyDailyReturn].
> ...
> ...
> 2019-05-13 14:40:20,036 [23268c40-ef3a-6349-5901-5762f6888971:frag:6:10] INFO 
> o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet 
> record reader.
> Message:
> Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet
> Total records read: 0
> Row group index: 0
> Records in row group: 3243
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>  optional int64 VectorListingId;
>  optional int32 RecordDate (DATE);
>  required binary Status (UTF8);
>  required binary CurrencyISO (UTF8);
>  optional double DailyReturn;
>  optional double DailyReturnUSD;
>  optional double NotionalReturnUSD;
> }
> , metadata: \{drill-writer.version=2, drill.version=1.15.0.0-mapr}}, blocks: 
> [BlockMetaData\{3243, 204762 [ColumnMetaData{UNCOMPRESSED [VectorListingId] 
> optional int64 VectorListingId [RLE, BIT_PACKED, PLAIN], 4}, 
> ColumnMetaData\{UNCOMPRESSED [RecordDate] optional int32 RecordDate (DATE) 
> [RLE, BIT_PACKED, PLAIN], 26021}, ColumnMetaData\{UNCOMPRESSED [Status] 
> required binary Status (UTF8) [BIT_PACKED, PLAIN], 39050}, 
> ColumnMetaData\{UNCOMPRESSED [CurrencyISO] required binary CurrencyISO (UTF8) 
> [BIT_PACKED, PLAIN], 103968}, ColumnMetaData\{UNCOMPRESSED [DailyReturn] 
> optional double DailyReturn [RLE, BIT_PACKED, PLAIN], 126715}, 
> ColumnMetaData\{UNCOMPRESSED [DailyReturnUSD] optional double DailyReturnUSD 
> [RLE, BIT_PACKED, PLAIN], 152732}, ColumnMetaData\{UNCOMPRESSED 
> [NotionalReturnUSD] optional double NotionalReturnUSD [RLE, BIT_PACKED, 
> PLAIN], 178749}]}]} (Error in parquet record reader.
> ...
> ...
> Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet
> Total records read: 0
> Row group index: 0
> Records in row group: 3243
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>  optional int64 VectorListingId;
> 

[jira] [Commented] (DRILL-7203) Back button for failed query does not return on Query page

2019-05-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845075#comment-16845075
 ] 

ASF GitHub Bot commented on DRILL-7203:
---

sohami commented on issue #1787: DRILL-7203: Back button not working 
URL: https://github.com/apache/drill/pull/1787#issuecomment-494487972
 
 
   @kkhatua : I am not sure if this is the right way to fix this issue. The 
issue as you mentioned is because we are loading the response from server in 
same page. I guess the reason for that is because any error in query on server 
side is also send as a HTTP success response. So we just chose to update the 
/query page with either actual result or exception from server side. Please 
consider below 2 options:
   
   1) Create a single `queryResult.ftl` page or repurpose `result.ftl` page, 
now on server side based on success or failure we populate it with either 
result.ftl or errorMessage.ftl template. In this way the form on /query page 
can always be directed to load /queryResult.ftl page (I guess using the 
`action` property)
   2) Somehow in success handler of ajax post request get the response page URL 
and load that url with the response data. Or may be use redirection mechanism 
to load the result page. Something like here: 
https://stackoverflow.com/questions/199099/how-to-manage-a-redirect-request-after-a-jquery-ajax-call
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  Back button for failed query does not return on Query page
> ---
>
> Key: DRILL-7203
> URL: https://issues.apache.org/jira/browse/DRILL-7203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: back_button.JPG
>
>
> Back button for failed query returns on previous page before Query page but 
> not on the Query page.
> Steps: 
> 1. go to Logs page
> 2. go to Query page
> 3. execute query with incorrect syntax (ex: x)
> 4. error message will be displayed, Back button will be in left corner 
> (screenshot attached)
> 5. press Back button
> 6. user is redirected to Logs page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7271:

Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY, 
SUB_PARTITION.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo 

[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7271:

Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY, 
SUB_PARTITION.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo 

[jira] [Closed] (DRILL-4843) Trailing spaces in CSV column headers cause IndexOutOfBoundsException

2019-05-21 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-4843.
---
Resolution: Fixed

The issue is not reproducible with Dill version 1.17.0-SNAPSHOT (commit id 
0195d1f34be7fd385ba76d2fd3e14a9fa13bd375)

> Trailing spaces in CSV column headers cause IndexOutOfBoundsException
> -
>
> Key: DRILL-4843
> URL: https://issues.apache.org/jira/browse/DRILL-4843
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.6.0, 1.7.0
> Environment: MapR Community cluster on CentOS 7.2
>Reporter: Matt Keranen
>Assignee: Paul Rogers
>Priority: Major
>
> When a CSV file with a header row has spaces after commas, an IOBE is thrown 
> when trying to reference column names. For example, this will cause the 
> exeption:
> {{col1, col2, col3}}
> Where this will not
> {{col1,col2,col3}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7271:

Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to 

[jira] [Closed] (DRILL-6978) typeOf drillTypeOf sqlTypeOf not work with generated tables

2019-05-21 Thread benj (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj closed DRILL-6978.
---

Fixed in 1.16

> typeOf drillTypeOf sqlTypeOf not work with generated tables
> ---
>
> Key: DRILL-6978
> URL: https://issues.apache.org/jira/browse/DRILL-6978
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: benj
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
>  
> *TypeOf functions works when request on files but doesn't work on "generated" 
> data
> This works :
> {code:java}
> SELECT typeof(md5), drillTypeOf(md5), sqlTypeOf(md5) FROM 
> dfs.tmp.`mytable.csv` LIMIT 2;
> => (OK)
> +--+--++
> |  EXPR$0  |  EXPR$1  |   EXPR$2   |
> +--+--++
> | VARCHAR  | VARCHAR  | CHARACTER VARYING  |
> | VARCHAR  | VARCHAR  | CHARACTER VARYING  |
> +--+--++{code}
> But not :
>  
>  
> {code:java}
> SELECT typeOf(a) FROM (SELECT CAST (5 as int) AS a) x;
> => (NOK)
> Error: SYSTEM ERROR: IllegalArgumentException: Can not set 
> org.apache.drill.exec.vector.complex.reader.FieldReader field 
> org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to 
> org.apache.drill.exec.expr.holders.IntHolder
> {code}
> And in a surprising way the next query works :
> {code:java}
> SELECT md5, typeof(t), drillTypeOf(t), sqlTypeOf(t) FROM ((SELECT 'foo' AS t 
> ) union (SELECT 'far' AS t)) x;
> => (OK)
> +---+--+--++
> |  md5  |  EXPR$1  |  EXPR$2  |   EXPR$3   |
> +---+--+--++
> | foo   | VARCHAR  | VARCHAR  | CHARACTER VARYING  |
> | bar   | VARCHAR  | VARCHAR  | CHARACTER VARYING  |
> +---+--+--++{code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5555) CSV file without headers: "SELECT a" fails, "SELECT columns, a" succeeds

2019-05-21 Thread Anton Gozhiy (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844966#comment-16844966
 ] 

Anton Gozhiy commented on DRILL-:
-

The issue is fixed in V3 reader for the case with "columns", but the case with 
star is still reproducible.

> CSV file without headers: "SELECT a" fails, "SELECT columns, a" succeeds
> 
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider the case discussed in DRILL-5554. Do exactly the same setup, but 
> with a slightly different query. The results are much different.
> Create a CSV file without headers:
> {code}
> 10,foo,bar
> {code}
> Use a CSV storage plugin configured to not skip the first line and not read 
> headers.
> Then, issue the following query:
> {code}
> SELECT columns, a FROM `dfs.data.example.csv`
> {code}
> Result:
> {code}
> columns,a
> ["10","foo","bar"],null
> {code}
> Schema:
> {code}
> columns(VARCHAR:REPEATED), 
> a(INT:OPTIONAL)
> {code}
> Since the query in DRILL-5554 fails:
> {code}
> SELECT a FROM ...
> {code}
> Expected the query described here to also fail, for a similar reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6954) Move commons libs used in UDFs module to the dependency management

2019-05-21 Thread benj (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj closed DRILL-6954.
---

OK In 1.16

> Move commons libs used in UDFs module to the dependency management
> --
>
> Key: DRILL-6954
> URL: https://issues.apache.org/jira/browse/DRILL-6954
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: benj
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> *USE CASE*
> The networking function ADDRESS_COUNT doesn't work anymore in DRILL *1.15.0*
> In *1.14.0* it's OK :
> {code:java}
> SELECT address_count('192.168.100.1/24');
> +-+
> | EXPR$0  |
> +-+
> | 254 |
> +-+{code}
> But in *1.15.0* it's NOK
> {code:java}
> SELECT address_count('192.168.100.1/24');
> Exception in thread "drill-executor-1" java.lang.NoSuchMethodError: 
> org.apache.commons.net.util.SubnetUtils$SubnetInfo.getAddressCountLong()J
>     at 
> org.apache.drill.exec.udfs.NetworkFunctions$AddressCountFunction.eval(NetworkFunctions.java:87)
>     at 
> org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateFunction(InterpreterEvaluator.java:129)
>     at 
> org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:334)
>     at 
> org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:194)
>     at 
> org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:53)
>     at 
> org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateConstantExpr(InterpreterEvaluator.java:69)
>     at 
> org.apache.drill.exec.planner.logical.DrillConstExecutor.reduce(DrillConstExecutor.java:151)
>     at 
> org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressionsInternal(ReduceExpressionsRule.java:620)
>     at 
> org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressions(ReduceExpressionsRule.java:540)
>     at 
> org.apache.calcite.rel.rules.ReduceExpressionsRule$ProjectReduceExpressionsRule.onMatch(ReduceExpressionsRule.java:288)
>     at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212)
>     at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:648)
>     at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339)
> ...
> {code}
>  
> Note that other Networking function seems to work well.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4224) Support MINUS set operator

2019-05-21 Thread benj (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844937#comment-16844937
 ] 

benj commented on DRILL-4224:
-

See Drill-4232

> Support MINUS set operator
> --
>
> Key: DRILL-4224
> URL: https://issues.apache.org/jira/browse/DRILL-4224
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen, SQL Parser
>Reporter: Ashwin Aravind
>Priority: Major
>
> Support for Set operator - MINUS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-5554) Wrong error type for "SELECT a" from a CSV file without headers

2019-05-21 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-5554.
---
Resolution: Fixed

Verified with Dill version 1.17.0-SNAPSHOT (commit id 
0195d1f34be7fd385ba76d2fd3e14a9fa13bd375)

The issue is fixed in V3 Text Reader.
It is validation error now.

> Wrong error type for "SELECT a" from a CSV file without headers
> ---
>
> Key: DRILL-5554
> URL: https://issues.apache.org/jira/browse/DRILL-5554
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Trivial
>
> Create a CSV file without headers:
> {code}
> 10,foo,bar
> {code}
> Use a CSV storage plugin configured to not skip the first line and not read 
> headers.
> Then, issue the following query:
> {code}
> SELECT a FROM `dfs.data.example.csv`
> {code}
> The result is correct: an error:
> {code}
> org.apache.drill.common.exceptions.UserRemoteException: 
> DATA_READ ERROR: Selected column 'a' must have name 'columns' or must be 
> plain '*'
> {code}
> But, the type of error is wrong. This is not a data read error: the file read 
> just fine. The problem is a semantic error: a query form that is not 
> compatible wth the storage plugin.
> Suggest using {{UserException.unsupportedError()}} instead since the user is 
> asking the plugin to do something that the plugin does not support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-5487) Vector corruption in CSV with headers and truncated last row

2019-05-21 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-5487.
---
Resolution: Fixed

Verified with Dill version 1.17.0-SNAPSHOT (commit id 
0195d1f34be7fd385ba76d2fd3e14a9fa13bd375)

The issue is fixed in V3 Text Reader.

> Vector corruption in CSV with headers and truncated last row
> 
>
> Key: DRILL-5487
> URL: https://issues.apache.org/jira/browse/DRILL-5487
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Major
> Fix For: Future
>
>
> The CSV format plugin allows two ways of reading data:
> * As named columns
> * As a single array, called {{columns}}, that holds all columns for a row
> The named columns feature will corrupt the offset vectors if the last row of 
> the file is truncated: leaves off one or more columns.
> To illustrate the CSV data corruption, I created a CSV file, test4.csv, of 
> the following form:
> {code}
> h,u
> abc,def
> ghi
> {code}
> Note that the file is truncated: the command and second field is missing on 
> the last line.
> Then, I created a simple test using the "cluster fixture" framework:
> {code}
>   @Test
>   public void readerTest() throws Exception {
> FixtureBuilder builder = ClusterFixture.builder()
> .maxParallelization(1);
> try (ClusterFixture cluster = builder.build();
>  ClientFixture client = cluster.clientFixture()) {
>   TextFormatConfig csvFormat = new TextFormatConfig();
>   csvFormat.fieldDelimiter = ',';
>   csvFormat.skipFirstLine = false;
>   csvFormat.extractHeader = true;
>   cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
>   String sql = "SELECT * FROM `dfs.data`.`csv/test4.csv` LIMIT 10";
>   client.queryBuilder().sql(sql).printCsv();
> }
>   }
> {code}
> The results show we've got a problem:
> {code}
> Exception (no rows returned): 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IllegalArgumentException: length: -3 (expected: >= 0)
> {code}
> If the last line were:
> {code}
> efg,
> {code}
> Then the offset vector should look like this:
> {code}
> [0, 3, 3]
> {code}
> Very likely we have an offset vector that looks like this instead:
> {code}
> [0, 3, 0]
> {code}
> When we compute the second column of the second row, we should compute:
> {code}
> length = offset[2] - offset[1] = 3 - 3 = 0
> {code}
> Instead we get:
> {code}
> length = offset[2] - offset[1] = 0 - 3 = -3
> {code}
> The summary is that a premature EOF appears to cause the "missing" columns to 
> be skipped; they are not filled with a blank value to "bump" the offset 
> vectors to fill in the last row. Instead, they are left at 0, causing havoc 
> downstream in the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7274) Introduce ANALYZE TABLE statements

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7274:

Fix Version/s: 1.17.0

> Introduce ANALYZE TABLE statements
> --
>
> Key: DRILL-7274
> URL: https://issues.apache.org/jira/browse/DRILL-7274
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7274) Introduce ANALYZE TABLE statements

2019-05-21 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7274:
---

 Summary: Introduce ANALYZE TABLE statements
 Key: DRILL-7274
 URL: https://issues.apache.org/jira/browse/DRILL-7274
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
Assignee: Volodymyr Vysotskyi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7272) Implement Drill Iceberg Metastore plugin

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7272:

Fix Version/s: 1.17.0

> Implement Drill Iceberg Metastore plugin
> 
>
> Key: DRILL-7272
> URL: https://issues.apache.org/jira/browse/DRILL-7272
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7273) Create operator for handling metadata

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7273:

Fix Version/s: 1.17.0

> Create operator for handling metadata
> -
>
> Key: DRILL-7273
> URL: https://issues.apache.org/jira/browse/DRILL-7273
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7273) Create operator for handling metadata

2019-05-21 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7273:
---

 Summary: Create operator for handling metadata
 Key: DRILL-7273
 URL: https://issues.apache.org/jira/browse/DRILL-7273
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
Assignee: Volodymyr Vysotskyi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7272) Implement Drill Iceberg Metastore plugin

2019-05-21 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7272:
---

 Summary: Implement Drill Iceberg Metastore plugin
 Key: DRILL-7272
 URL: https://issues.apache.org/jira/browse/DRILL-7272
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-05-21 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7271:
---

 Summary: Refactor Metadata interfaces and classes to contain all 
needed information for the File based Metastore
 Key: DRILL-7271
 URL: https://issues.apache.org/jira/browse/DRILL-7271
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
Assignee: Volodymyr Vysotskyi


1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7271:

Fix Version/s: 1.17.0

> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: DIRECTORY.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6742) Research and investigate a way for collecting and storing table statistics in the scope of metastore integration

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6742:

Fix Version/s: 1.16.0

> Research and investigate a way for collecting and storing table statistics in 
> the scope of metastore integration
> 
>
> Key: DRILL-6742
> URL: https://issues.apache.org/jira/browse/DRILL-6742
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-1328 was made a significant work for collecting and 
> processing table statistics. The main show stopper for this Jira is a way of 
> storing collected statistics.
> The aim of this Jira is to investigate how this statistics may be stored in 
> the metastore for different table types, a way how it may be additionally 
> collected and received from the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6742) Research and investigate a way for collecting and storing table statistics in the scope of metastore integration

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-6742.
-
Resolution: Fixed

> Research and investigate a way for collecting and storing table statistics in 
> the scope of metastore integration
> 
>
> Key: DRILL-6742
> URL: https://issues.apache.org/jira/browse/DRILL-6742
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> In the scope of DRILL-1328 was made a significant work for collecting and 
> processing table statistics. The main show stopper for this Jira is a way of 
> storing collected statistics.
> The aim of this Jira is to investigate how this statistics may be stored in 
> the metastore for different table types, a way how it may be additionally 
> collected and received from the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6552) Drill Metadata management "Drill MetaStore"

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6552:
---

Assignee: Volodymyr Vysotskyi  (was: Vitalii Diravka)

> Drill Metadata management "Drill MetaStore"
> ---
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6552) Drill Metadata management "Drill MetaStore"

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6552:

Fix Version/s: (was: 2.0.0)
   1.17.0

> Drill Metadata management "Drill MetaStore"
> ---
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5903) Regression: Query encounters "Waited for 15000ms, but tasks for 'Fetch parquet metadata' are not complete."

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-5903.
-
Resolution: Invalid

> Regression: Query encounters "Waited for 15000ms, but tasks for 'Fetch 
> parquet metadata' are not complete."
> ---
>
> Key: DRILL-5903
> URL: https://issues.apache.org/jira/browse/DRILL-5903
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Critical
> Attachments: 26122f83-6956-5aa8-d8de-d4808f572160.sys.drill, 
> drillbit.log
>
>
> This is a query from the Functional-Baseline-100.171 run.  The test is 
> /root/drillAutomation/mapr/framework/resources/Functional/parquet_storage/parquet_date/mc_parquet_date/generic/mixed1_partitioned5.q.
> Query is:
> {noformat}
> select a.int_col, b.date_col from 
> dfs.`/drill/testdata/parquet_date/metadata_cache/mixed/fewtypes_null_large` a 
> inner join ( select date_col, int_col from 
> dfs.`/drill/testdata/parquet_date/metadata_cache/mixed/fewtypes_null_large` 
> where dir0 = '1.2' and date_col > '1996-03-07' ) b on cast(a.date_col as 
> date)= date_add(b.date_col, 5) where a.int_col = 7 and a.dir0='1.9' group by 
> a.int_col, b.date_col
> {noformat}
> From drillbit.log:
> {noformat}
> fc65-d430-ac1103638113: SELECT SUM(col_int) OVER() sum_int FROM 
> vwOnParq_wCst_35
> 2017-10-23 11:20:50,122 [26122f83-6956-5aa8-d8de-d4808f572160:foreman] ERROR 
> o.a.d.exec.store.parquet.Metadata - Waited for 15000ms, but tasks for 'Fetch 
> parquet metadata' are not complete. Total runnable size 3, parallelism 3.
> 2017-10-23 11:20:50,127 [26122f83-6956-5aa8-d8de-d4808f572160:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - User Error Occurred: Waited for 15000ms, 
> but tasks for 'Fetch parquet metadata' are not complete. Total runnable size 
> 3, parallelism 3.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: Waited for 
> 15000ms, but tasks for 'Fetch parquet metadata' are not complete. Total 
> runnable size 3, parallelism 3.
> [Error Id: 7484e127-ea41-4797-83c0-6619ea9b2bcd ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:151) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:341)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:318)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:142)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:934)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:227)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:190)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:170)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:66)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:144)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:100)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:62)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22]
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811)
>  [calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22]
>   at 
> 

[jira] [Commented] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl

2019-05-21 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844811#comment-16844811
 ] 

Arina Ielchiieva commented on DRILL-7091:
-

We should definitely fix this in 1.17.

> Query with EXISTS and correlated subquery fails with NPE in 
> HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
> --
>
> Key: DRILL-7091
> URL: https://issues.apache.org/jira/browse/DRILL-7091
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>
> Steps to reproduce:
> 1. Create view:
> {code:sql}
> create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
> {code}
> Run the following query:
> {code:sql}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey)
> {code}
> This query fails with NPE:
> {noformat}
> [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  

[jira] [Updated] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7091:

Fix Version/s: 1.17.0

> Query with EXISTS and correlated subquery fails with NPE in 
> HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
> --
>
> Key: DRILL-7091
> URL: https://issues.apache.org/jira/browse/DRILL-7091
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>
> Steps to reproduce:
> 1. Create view:
> {code:sql}
> create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
> {code}
> Run the following query:
> {code:sql}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey)
> {code}
> This query fails with NPE:
> {noformat}
> [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> 

[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7270:

Component/s: Security

> Fix non-https dependency urls and add checksum checks
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>  Components: Security
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7270) Fix non-https dependency urls

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7270:

Description: 
Review any build scripts and configurations for insecure urls and make 
appropriate fixes to use secure urls.

Projects like Lucene do checksum whitelists of all their build dependencies, 
and you may wish to consider that as a
protection against threats beyond just MITM.

  was:Review any build scripts and configurations for insecure urls and make 
appropriate fixes to use secure urls.


> Fix non-https dependency urls
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7270:

Summary: Fix non-https dependency urls and add checksum checks  (was: Fix 
non-https dependency urls)

> Fix non-https dependency urls and add checksum checks
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7270) Fix non-https dependency urls

2019-05-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7270:

Reviewer: Volodymyr Vysotskyi

> Fix non-https dependency urls
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7270) Fix non-https dependency urls

2019-05-21 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7270:
---

 Summary: Fix non-https dependency urls
 Key: DRILL-7270
 URL: https://issues.apache.org/jira/browse/DRILL-7270
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.16.0
Reporter: Arina Ielchiieva
Assignee: Dmytriy Grinchenko
 Fix For: 1.17.0


Review any build scripts and configurations for insecure urls and make 
appropriate fixes to use secure urls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Priority: Minor  (was: Major)

> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Minor
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Attachment: (was: sharded_mongo.log)

> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Attachment: (was: single_mongo.log)

> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Description: 
The problem hiding in {{MongoTestSuit}} and the way how the data r bootstrapped 
on distributed cluster start up. It looks like not all shard come online, while 
we already starting to upload test data sets to DB.  

Below is an comparison of data between sharded cluster and single-mode:
{code:title=sharded}
#: full_name
0: "Mary Pierson"
1: "John Reed"
2: "Lynn Kwiatkowski"
3: "Donald Vann"
4: "Judy Owens"
5: "Lori Lightfoot"
{code}

{code:title=single}
#: full_name
0: "Steve Eurich"
1: "Mary Pierson"
2: "John Reed"
3: "Lynn Kwiatkowski"
4: "Donald Vann"
5: "Judy Owens"
6: "Lori Lightfoot"
7: "Kumar"
8: "Kamesh"
{code}

 

  was:
The problem hiding in {{MongoTestSuit}} and the way how the data r bootstrapped 
on distributed cluster start up. It looks like not all shard come online, while 
we already starting to upload test data sets to DB.  
The bug start appear after fixes done in DRILL-7196, where we have deployed a 
sharded cluster but have used it as a single-node.


Below is an comparison of data between sharded cluster and single-mode:
{code:title=sharded}
#: full_name
0: "Mary Pierson"
1: "John Reed"
2: "Lynn Kwiatkowski"
3: "Donald Vann"
4: "Judy Owens"
5: "Lori Lightfoot"
{code}

{code:title=single}
#: full_name
0: "Steve Eurich"
1: "Mary Pierson"
2: "John Reed"
3: "Lynn Kwiatkowski"
4: "Donald Vann"
5: "Judy Owens"
6: "Lori Lightfoot"
7: "Kumar"
8: "Kamesh"
{code}

The Mongo server startup logs attached respectively. 


> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Component/s: (was: Tools, Build & Test)

> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Attachments: sharded_mongo.log, single_mongo.log
>
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> The bug start appear after fixes done in DRILL-7196, where we have deployed a 
> sharded cluster but have used it as a single-node.
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
> The Mongo server startup logs attached respectively. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Fix Version/s: (was: 1.17.0)

> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Attachments: sharded_mongo.log, single_mongo.log
>
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> The bug start appear after fixes done in DRILL-7196, where we have deployed a 
> sharded cluster but have used it as a single-node.
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
> The Mongo server startup logs attached respectively. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5983) Unsupported nullable converted type INT_8 for primitive type INT32 error

2019-05-21 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844669#comment-16844669
 ] 

Volodymyr Vysotskyi commented on DRILL-5983:


[~vezir], [~davlee1...@yahoo.com], could you please share parquet files for 
which Drill fails, since I tried files from the DRILL-4764, and on Drill 1.16.0 
everything works fine.

> Unsupported nullable converted type INT_8 for primitive type INT32 error
> 
>
> Key: DRILL-5983
> URL: https://issues.apache.org/jira/browse/DRILL-5983
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.10.0, 1.11.0
> Environment: NAME="Ubuntu"
> VERSION="16.04.2 LTS (Xenial Xerus)"
>Reporter: Hakan Sarıbıyık
>Priority: Major
>  Labels: parquet, read, types
>
> When I query a table with byte in it, then it gives an error;
> _Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> ExecutionSetupException: Unsupported nullable converted type INT_8 for 
> primitive type INT32 Fragment 1:6 [Error Id: 
> 46636b05-cff5-455b-ba25-527217346b3e on bigdata7:31010]_
> Actualy, it has been solved with
> [DRILL-4764] - Parquet file with INT_16, etc. logical types not supported by 
> simple SELECT
> according to https://drill.apache.org/docs/apache-drill-1-10-0-release-notes/
> But i tried it with even 1-11-0 it didnt worked.
> I am querying parquet formatted file with pySpark 
> tablo1
> sourceid: byte (nullable = true)
> select sourceid from tablo1
> works as expected with pySpark. But not with Drill v1.11.0
> Thanx.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7269:
--
Attachment: single_mongo.log
sharded_mongo.log

> Mongo Unit-tests not able to import properly the test data when running in 
> sharded mode
> ---
>
> Key: DRILL-7269
> URL: https://issues.apache.org/jira/browse/DRILL-7269
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.17.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: sharded_mongo.log, single_mongo.log
>
>
> The problem hiding in {{MongoTestSuit}} and the way how the data r 
> bootstrapped on distributed cluster start up. It looks like not all shard 
> come online, while we already starting to upload test data sets to DB.  
> The bug start appear after fixes done in DRILL-7196, where we have deployed a 
> sharded cluster but have used it as a single-node.
> Below is an comparison of data between sharded cluster and single-mode:
> {code:title=sharded}
> #: full_name
> 0: "Mary Pierson"
> 1: "John Reed"
> 2: "Lynn Kwiatkowski"
> 3: "Donald Vann"
> 4: "Judy Owens"
> 5: "Lori Lightfoot"
> {code}
> {code:title=single}
> #: full_name
> 0: "Steve Eurich"
> 1: "Mary Pierson"
> 2: "John Reed"
> 3: "Lynn Kwiatkowski"
> 4: "Donald Vann"
> 5: "Judy Owens"
> 6: "Lori Lightfoot"
> 7: "Kumar"
> 8: "Kamesh"
> {code}
> The Mongo server startup logs attached respectively. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode

2019-05-21 Thread Dmytriy Grinchenko (JIRA)
Dmytriy Grinchenko created DRILL-7269:
-

 Summary: Mongo Unit-tests not able to import properly the test 
data when running in sharded mode
 Key: DRILL-7269
 URL: https://issues.apache.org/jira/browse/DRILL-7269
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.17.0
Reporter: Dmytriy Grinchenko
Assignee: Dmytriy Grinchenko
 Fix For: 1.17.0


The problem hiding in {{MongoTestSuit}} and the way how the data r bootstrapped 
on distributed cluster start up. It looks like not all shard come online, while 
we already starting to upload test data sets to DB.  
The bug start appear after fixes done in DRILL-7196, where we have deployed a 
sharded cluster but have used it as a single-node.


Below is an comparison of data between sharded cluster and single-mode:
{code:title=sharded}
#: full_name
0: "Mary Pierson"
1: "John Reed"
2: "Lynn Kwiatkowski"
3: "Donald Vann"
4: "Judy Owens"
5: "Lori Lightfoot"
{code}

{code:title=single}
#: full_name
0: "Steve Eurich"
1: "Mary Pierson"
2: "John Reed"
3: "Lynn Kwiatkowski"
4: "Donald Vann"
5: "Judy Owens"
6: "Lori Lightfoot"
7: "Kumar"
8: "Kamesh"
{code}

The Mongo server startup logs attached respectively. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)