[jira] [Created] (DRILL-7275) CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should be greater than zero [0]
Khurram Faraaz created DRILL-7275: - Summary: CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should be greater than zero [0] Key: DRILL-7275 URL: https://issues.apache.org/jira/browse/DRILL-7275 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.15.0 Reporter: Khurram Faraaz CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should be greater than zero [0] Precondition check fails on line 47 in VarLenFixedEntryReader.java {noformat} 44 final int expectedDataLen = columnPrecInfo.precision; 45 final int entrySz = 4 + columnPrecInfo.precision; 46 final int readBatch = getFixedLengthMaxRecordsToRead(valuesToRead, entrySz); 47 Preconditions.checkState(readBatch > 0, "Read batch count [%d] should be greater than zero", readBatch); {noformat} Stack trace from drillbit.log, also has the failing query. {noformat} 2019-05-13 14:40:14,090 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 23268c40-ef3a-6349-5901-5762f6888971 issued by scoop_stc: CREATE TABLE TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityProxyDailyReturn AS WITH che AS ( SELECT * FROM TEST_TEMPLATE_SCHEMA_creid.tbl_c_CompositeHierarchyEntry_TimeVarying WHERE CompositeHierarchyName = 'AxiomaRegion/AxiomaSector/VectorUniverse' AND state = 'DupesRemoved' AND CompositeLevel = 'AxiomaRegion_1/AxiomaSector_1/VectorUniverse_0' ), ef AS (SELECT * FROM TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityDailyReturn_FXAdjusted WHERE Status = 'PresentInRawData'), d AS (SELECT * FROM TEST_TEMPLATE_SCHEMA_creid.tbl_r_BusinessDate WHERE IsWeekday), x AS ( SELECT che.CompositeHierarchyName, che.State, che.CompositeNodeName, d.`Date` AS RecordDate, COUNT(che.CompositeNodeName) AS countDistinctConstituents, COUNT(ef.VectorListingId) AS countDataPoints, AVG(ef.DailyReturn) AS AvgReturn, AVG(ef.DailyReturnUSD) AS AvgReturnUSD, AVG(ef.NotionalReturnUSD) AS AvgNotionalReturnUSD FROM d INNER JOIN che ON d.`Date` BETWEEN che.CompositeUltimateChildStartDate AND che.CompositeUltimateChildEndDate LEFT OUTER JOIN ef ON d.`Date` = ef.RecordDate AND 'VectorListingId_' || CAST(ef.VectorListingId AS VARCHAR(100)) = che.UltimateChild GROUP BY che.CompositeHierarchyName, che.State, che.CompositeNodeName, d.`Date`, d.IsWeekday, d.IsHoliday ) SELECT * FROM x 2019-05-13 14:40:16,971 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table [tbl_c_EquityProxyDailyReturn]. ... ... 2019-05-13 14:40:20,036 [23268c40-ef3a-6349-5901-5762f6888971:frag:6:10] INFO o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet record reader. Message: Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet Total records read: 0 Row group index: 0 Records in row group: 3243 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { optional int64 VectorListingId; optional int32 RecordDate (DATE); required binary Status (UTF8); required binary CurrencyISO (UTF8); optional double DailyReturn; optional double DailyReturnUSD; optional double NotionalReturnUSD; } , metadata: \{drill-writer.version=2, drill.version=1.15.0.0-mapr}}, blocks: [BlockMetaData\{3243, 204762 [ColumnMetaData{UNCOMPRESSED [VectorListingId] optional int64 VectorListingId [RLE, BIT_PACKED, PLAIN], 4}, ColumnMetaData\{UNCOMPRESSED [RecordDate] optional int32 RecordDate (DATE) [RLE, BIT_PACKED, PLAIN], 26021}, ColumnMetaData\{UNCOMPRESSED [Status] required binary Status (UTF8) [BIT_PACKED, PLAIN], 39050}, ColumnMetaData\{UNCOMPRESSED [CurrencyISO] required binary CurrencyISO (UTF8) [BIT_PACKED, PLAIN], 103968}, ColumnMetaData\{UNCOMPRESSED [DailyReturn] optional double DailyReturn [RLE, BIT_PACKED, PLAIN], 126715}, ColumnMetaData\{UNCOMPRESSED [DailyReturnUSD] optional double DailyReturnUSD [RLE, BIT_PACKED, PLAIN], 152732}, ColumnMetaData\{UNCOMPRESSED [NotionalReturnUSD] optional double NotionalReturnUSD [RLE, BIT_PACKED, PLAIN], 178749}]}]} (Error in parquet record reader. ... ... Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet Total records read: 0 Row group index: 0 Records in row group: 3243 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { optional int64 VectorListingId; optional int32 RecordDate (DATE); required binary Status (UTF8); required binary CurrencyISO (UTF8); optional double DailyReturn; optional double DailyReturnUSD; optional double NotionalReturnUSD; } , metadata: \{drill-writer.version=2, drill.version=1.15.0.0-mapr}}, blocks: [BlockMetaData\{3243, 204762 [ColumnMetaData{UNCOMPRESSED [VectorListingId] optional int64 VectorListingId [RLE, BIT_PACKED, PLAIN], 4}, ColumnMetaData\{UNCOMPRESSED [RecordDate] optional int32 RecordDate (DATE) [RLE,
[jira] [Assigned] (DRILL-7275) CTAS + CTE query fails with IllegalStateException: Read batch count [%d] should be greater than zero [0]
[ https://issues.apache.org/jira/browse/DRILL-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz reassigned DRILL-7275: - Assignee: salim achouche > CTAS + CTE query fails with IllegalStateException: Read batch count [%d] > should be greater than zero [0] > > > Key: DRILL-7275 > URL: https://issues.apache.org/jira/browse/DRILL-7275 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.15.0 >Reporter: Khurram Faraaz >Assignee: salim achouche >Priority: Major > > CTAS + CTE query fails with IllegalStateException: Read batch count [%d] > should be greater than zero [0] > Precondition check fails on line 47 in VarLenFixedEntryReader.java > {noformat} > 44 final int expectedDataLen = columnPrecInfo.precision; > 45 final int entrySz = 4 + columnPrecInfo.precision; > 46 final int readBatch = getFixedLengthMaxRecordsToRead(valuesToRead, > entrySz); > 47 Preconditions.checkState(readBatch > 0, "Read batch count [%d] should be > greater than zero", readBatch); > {noformat} > Stack trace from drillbit.log, also has the failing query. > {noformat} > 2019-05-13 14:40:14,090 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query with id > 23268c40-ef3a-6349-5901-5762f6888971 issued by scoop_stc: CREATE TABLE > TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityProxyDailyReturn AS > WITH > che AS ( > SELECT * FROM > TEST_TEMPLATE_SCHEMA_creid.tbl_c_CompositeHierarchyEntry_TimeVarying > WHERE CompositeHierarchyName = 'AxiomaRegion/AxiomaSector/VectorUniverse' > AND state = 'DupesRemoved' > AND CompositeLevel = 'AxiomaRegion_1/AxiomaSector_1/VectorUniverse_0' > ), > ef AS (SELECT * FROM > TEST_TEMPLATE_SCHEMA_creid.tbl_c_EquityDailyReturn_FXAdjusted WHERE Status = > 'PresentInRawData'), > d AS (SELECT * FROM TEST_TEMPLATE_SCHEMA_creid.tbl_r_BusinessDate WHERE > IsWeekday), > x AS > ( > SELECT > che.CompositeHierarchyName, > che.State, > che.CompositeNodeName, > d.`Date` AS RecordDate, > COUNT(che.CompositeNodeName) AS countDistinctConstituents, > COUNT(ef.VectorListingId) AS countDataPoints, > AVG(ef.DailyReturn) AS AvgReturn, > AVG(ef.DailyReturnUSD) AS AvgReturnUSD, > AVG(ef.NotionalReturnUSD) AS AvgNotionalReturnUSD > FROM d > INNER JOIN che ON d.`Date` BETWEEN che.CompositeUltimateChildStartDate AND > che.CompositeUltimateChildEndDate > LEFT OUTER JOIN ef ON d.`Date` = ef.RecordDate AND 'VectorListingId_' || > CAST(ef.VectorListingId AS VARCHAR(100)) = che.UltimateChild > GROUP BY che.CompositeHierarchyName, che.State, che.CompositeNodeName, > d.`Date`, d.IsWeekday, d.IsHoliday > ) > SELECT * FROM x > 2019-05-13 14:40:16,971 [23268c40-ef3a-6349-5901-5762f6888971:foreman] INFO > o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table > [tbl_c_EquityProxyDailyReturn]. > ... > ... > 2019-05-13 14:40:20,036 [23268c40-ef3a-6349-5901-5762f6888971:frag:6:10] INFO > o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet > record reader. > Message: > Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet > Total records read: 0 > Row group index: 0 > Records in row group: 3243 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { > optional int64 VectorListingId; > optional int32 RecordDate (DATE); > required binary Status (UTF8); > required binary CurrencyISO (UTF8); > optional double DailyReturn; > optional double DailyReturnUSD; > optional double NotionalReturnUSD; > } > , metadata: \{drill-writer.version=2, drill.version=1.15.0.0-mapr}}, blocks: > [BlockMetaData\{3243, 204762 [ColumnMetaData{UNCOMPRESSED [VectorListingId] > optional int64 VectorListingId [RLE, BIT_PACKED, PLAIN], 4}, > ColumnMetaData\{UNCOMPRESSED [RecordDate] optional int32 RecordDate (DATE) > [RLE, BIT_PACKED, PLAIN], 26021}, ColumnMetaData\{UNCOMPRESSED [Status] > required binary Status (UTF8) [BIT_PACKED, PLAIN], 39050}, > ColumnMetaData\{UNCOMPRESSED [CurrencyISO] required binary CurrencyISO (UTF8) > [BIT_PACKED, PLAIN], 103968}, ColumnMetaData\{UNCOMPRESSED [DailyReturn] > optional double DailyReturn [RLE, BIT_PACKED, PLAIN], 126715}, > ColumnMetaData\{UNCOMPRESSED [DailyReturnUSD] optional double DailyReturnUSD > [RLE, BIT_PACKED, PLAIN], 152732}, ColumnMetaData\{UNCOMPRESSED > [NotionalReturnUSD] optional double NotionalReturnUSD [RLE, BIT_PACKED, > PLAIN], 178749}]}]} (Error in parquet record reader. > ... > ... > Hadoop path: /DEV/tbl_c_EquityDailyReturn_FXAdjusted/1_32_32.parquet > Total records read: 0 > Row group index: 0 > Records in row group: 3243 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { > optional int64 VectorListingId; >
[jira] [Commented] (DRILL-7203) Back button for failed query does not return on Query page
[ https://issues.apache.org/jira/browse/DRILL-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845075#comment-16845075 ] ASF GitHub Bot commented on DRILL-7203: --- sohami commented on issue #1787: DRILL-7203: Back button not working URL: https://github.com/apache/drill/pull/1787#issuecomment-494487972 @kkhatua : I am not sure if this is the right way to fix this issue. The issue as you mentioned is because we are loading the response from server in same page. I guess the reason for that is because any error in query on server side is also send as a HTTP success response. So we just chose to update the /query page with either actual result or exception from server side. Please consider below 2 options: 1) Create a single `queryResult.ftl` page or repurpose `result.ftl` page, now on server side based on success or failure we populate it with either result.ftl or errorMessage.ftl template. In this way the form on /query page can always be directed to load /queryResult.ftl page (I guess using the `action` property) 2) Somehow in success handler of ajax post request get the response page URL and load that url with the response data. Or may be use redirection mechanism to load the result page. Something like here: https://stackoverflow.com/questions/199099/how-to-manage-a-redirect-request-after-a-jquery-ajax-call This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Back button for failed query does not return on Query page > --- > > Key: DRILL-7203 > URL: https://issues.apache.org/jira/browse/DRILL-7203 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > Attachments: back_button.JPG > > > Back button for failed query returns on previous page before Query page but > not on the Query page. > Steps: > 1. go to Logs page > 2. go to Query page > 3. execute query with incorrect syntax (ex: x) > 4. error message will be displayed, Back button will be in left corner > (screenshot attached) > 5. press Back button > 6. user is redirected to Logs page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7271: Description: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} org.apache.drill.metastore.RowGroupMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} 8. Remove org.apache.drill.exec package from metastore module. 9. Rename ColumnStatisticsImpl class. 10. Separate existing classes in org.apache.drill.metastore package into sub-packages. was: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY, SUB_PARTITION. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7271: Description: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY, SUB_PARTITION. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} org.apache.drill.metastore.RowGroupMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} 8. Remove org.apache.drill.exec package from metastore module. 9. Rename ColumnStatisticsImpl class. 10. Separate existing classes in org.apache.drill.metastore package into sub-packages. was: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo
[jira] [Closed] (DRILL-4843) Trailing spaces in CSV column headers cause IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Gozhiy closed DRILL-4843. --- Resolution: Fixed The issue is not reproducible with Dill version 1.17.0-SNAPSHOT (commit id 0195d1f34be7fd385ba76d2fd3e14a9fa13bd375) > Trailing spaces in CSV column headers cause IndexOutOfBoundsException > - > > Key: DRILL-4843 > URL: https://issues.apache.org/jira/browse/DRILL-4843 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text CSV >Affects Versions: 1.6.0, 1.7.0 > Environment: MapR Community cluster on CentOS 7.2 >Reporter: Matt Keranen >Assignee: Paul Rogers >Priority: Major > > When a CSV file with a header row has spaces after commas, an IOBE is thrown > when trying to reference column names. For example, this will cause the > exeption: > {{col1, col2, col3}} > Where this will not > {{col1,col2,col3}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7271: Description: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} org.apache.drill.metastore.RowGroupMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} 8. Remove org.apache.drill.exec package from metastore module. 9. Rename ColumnStatisticsImpl class. 10. Separate existing classes in org.apache.drill.metastore package into sub-packages. was: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to
[jira] [Closed] (DRILL-6978) typeOf drillTypeOf sqlTypeOf not work with generated tables
[ https://issues.apache.org/jira/browse/DRILL-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj closed DRILL-6978. --- Fixed in 1.16 > typeOf drillTypeOf sqlTypeOf not work with generated tables > --- > > Key: DRILL-6978 > URL: https://issues.apache.org/jira/browse/DRILL-6978 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 >Reporter: benj >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > > *TypeOf functions works when request on files but doesn't work on "generated" > data > This works : > {code:java} > SELECT typeof(md5), drillTypeOf(md5), sqlTypeOf(md5) FROM > dfs.tmp.`mytable.csv` LIMIT 2; > => (OK) > +--+--++ > | EXPR$0 | EXPR$1 | EXPR$2 | > +--+--++ > | VARCHAR | VARCHAR | CHARACTER VARYING | > | VARCHAR | VARCHAR | CHARACTER VARYING | > +--+--++{code} > But not : > > > {code:java} > SELECT typeOf(a) FROM (SELECT CAST (5 as int) AS a) x; > => (NOK) > Error: SYSTEM ERROR: IllegalArgumentException: Can not set > org.apache.drill.exec.vector.complex.reader.FieldReader field > org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to > org.apache.drill.exec.expr.holders.IntHolder > {code} > And in a surprising way the next query works : > {code:java} > SELECT md5, typeof(t), drillTypeOf(t), sqlTypeOf(t) FROM ((SELECT 'foo' AS t > ) union (SELECT 'far' AS t)) x; > => (OK) > +---+--+--++ > | md5 | EXPR$1 | EXPR$2 | EXPR$3 | > +---+--+--++ > | foo | VARCHAR | VARCHAR | CHARACTER VARYING | > | bar | VARCHAR | VARCHAR | CHARACTER VARYING | > +---+--+--++{code} > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5555) CSV file without headers: "SELECT a" fails, "SELECT columns, a" succeeds
[ https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844966#comment-16844966 ] Anton Gozhiy commented on DRILL-: - The issue is fixed in V3 reader for the case with "columns", but the case with star is still reproducible. > CSV file without headers: "SELECT a" fails, "SELECT columns, a" succeeds > > > Key: DRILL- > URL: https://issues.apache.org/jira/browse/DRILL- > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text CSV >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Priority: Minor > > Consider the case discussed in DRILL-5554. Do exactly the same setup, but > with a slightly different query. The results are much different. > Create a CSV file without headers: > {code} > 10,foo,bar > {code} > Use a CSV storage plugin configured to not skip the first line and not read > headers. > Then, issue the following query: > {code} > SELECT columns, a FROM `dfs.data.example.csv` > {code} > Result: > {code} > columns,a > ["10","foo","bar"],null > {code} > Schema: > {code} > columns(VARCHAR:REPEATED), > a(INT:OPTIONAL) > {code} > Since the query in DRILL-5554 fails: > {code} > SELECT a FROM ... > {code} > Expected the query described here to also fail, for a similar reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6954) Move commons libs used in UDFs module to the dependency management
[ https://issues.apache.org/jira/browse/DRILL-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj closed DRILL-6954. --- OK In 1.16 > Move commons libs used in UDFs module to the dependency management > -- > > Key: DRILL-6954 > URL: https://issues.apache.org/jira/browse/DRILL-6954 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 >Reporter: benj >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > *USE CASE* > The networking function ADDRESS_COUNT doesn't work anymore in DRILL *1.15.0* > In *1.14.0* it's OK : > {code:java} > SELECT address_count('192.168.100.1/24'); > +-+ > | EXPR$0 | > +-+ > | 254 | > +-+{code} > But in *1.15.0* it's NOK > {code:java} > SELECT address_count('192.168.100.1/24'); > Exception in thread "drill-executor-1" java.lang.NoSuchMethodError: > org.apache.commons.net.util.SubnetUtils$SubnetInfo.getAddressCountLong()J > at > org.apache.drill.exec.udfs.NetworkFunctions$AddressCountFunction.eval(NetworkFunctions.java:87) > at > org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateFunction(InterpreterEvaluator.java:129) > at > org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:334) > at > org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:194) > at > org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:53) > at > org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateConstantExpr(InterpreterEvaluator.java:69) > at > org.apache.drill.exec.planner.logical.DrillConstExecutor.reduce(DrillConstExecutor.java:151) > at > org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressionsInternal(ReduceExpressionsRule.java:620) > at > org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressions(ReduceExpressionsRule.java:540) > at > org.apache.calcite.rel.rules.ReduceExpressionsRule$ProjectReduceExpressionsRule.onMatch(ReduceExpressionsRule.java:288) > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212) > at > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:648) > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339) > ... > {code} > > Note that other Networking function seems to work well. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-4224) Support MINUS set operator
[ https://issues.apache.org/jira/browse/DRILL-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844937#comment-16844937 ] benj commented on DRILL-4224: - See Drill-4232 > Support MINUS set operator > -- > > Key: DRILL-4224 > URL: https://issues.apache.org/jira/browse/DRILL-4224 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen, SQL Parser >Reporter: Ashwin Aravind >Priority: Major > > Support for Set operator - MINUS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-5554) Wrong error type for "SELECT a" from a CSV file without headers
[ https://issues.apache.org/jira/browse/DRILL-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Gozhiy closed DRILL-5554. --- Resolution: Fixed Verified with Dill version 1.17.0-SNAPSHOT (commit id 0195d1f34be7fd385ba76d2fd3e14a9fa13bd375) The issue is fixed in V3 Text Reader. It is validation error now. > Wrong error type for "SELECT a" from a CSV file without headers > --- > > Key: DRILL-5554 > URL: https://issues.apache.org/jira/browse/DRILL-5554 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Priority: Trivial > > Create a CSV file without headers: > {code} > 10,foo,bar > {code} > Use a CSV storage plugin configured to not skip the first line and not read > headers. > Then, issue the following query: > {code} > SELECT a FROM `dfs.data.example.csv` > {code} > The result is correct: an error: > {code} > org.apache.drill.common.exceptions.UserRemoteException: > DATA_READ ERROR: Selected column 'a' must have name 'columns' or must be > plain '*' > {code} > But, the type of error is wrong. This is not a data read error: the file read > just fine. The problem is a semantic error: a query form that is not > compatible wth the storage plugin. > Suggest using {{UserException.unsupportedError()}} instead since the user is > asking the plugin to do something that the plugin does not support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-5487) Vector corruption in CSV with headers and truncated last row
[ https://issues.apache.org/jira/browse/DRILL-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Gozhiy closed DRILL-5487. --- Resolution: Fixed Verified with Dill version 1.17.0-SNAPSHOT (commit id 0195d1f34be7fd385ba76d2fd3e14a9fa13bd375) The issue is fixed in V3 Text Reader. > Vector corruption in CSV with headers and truncated last row > > > Key: DRILL-5487 > URL: https://issues.apache.org/jira/browse/DRILL-5487 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text CSV >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Priority: Major > Fix For: Future > > > The CSV format plugin allows two ways of reading data: > * As named columns > * As a single array, called {{columns}}, that holds all columns for a row > The named columns feature will corrupt the offset vectors if the last row of > the file is truncated: leaves off one or more columns. > To illustrate the CSV data corruption, I created a CSV file, test4.csv, of > the following form: > {code} > h,u > abc,def > ghi > {code} > Note that the file is truncated: the command and second field is missing on > the last line. > Then, I created a simple test using the "cluster fixture" framework: > {code} > @Test > public void readerTest() throws Exception { > FixtureBuilder builder = ClusterFixture.builder() > .maxParallelization(1); > try (ClusterFixture cluster = builder.build(); > ClientFixture client = cluster.clientFixture()) { > TextFormatConfig csvFormat = new TextFormatConfig(); > csvFormat.fieldDelimiter = ','; > csvFormat.skipFirstLine = false; > csvFormat.extractHeader = true; > cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat); > String sql = "SELECT * FROM `dfs.data`.`csv/test4.csv` LIMIT 10"; > client.queryBuilder().sql(sql).printCsv(); > } > } > {code} > The results show we've got a problem: > {code} > Exception (no rows returned): > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IllegalArgumentException: length: -3 (expected: >= 0) > {code} > If the last line were: > {code} > efg, > {code} > Then the offset vector should look like this: > {code} > [0, 3, 3] > {code} > Very likely we have an offset vector that looks like this instead: > {code} > [0, 3, 0] > {code} > When we compute the second column of the second row, we should compute: > {code} > length = offset[2] - offset[1] = 3 - 3 = 0 > {code} > Instead we get: > {code} > length = offset[2] - offset[1] = 0 - 3 = -3 > {code} > The summary is that a premature EOF appears to cause the "missing" columns to > be skipped; they are not filled with a blank value to "bump" the offset > vectors to fill in the last row. Instead, they are left at 0, causing havoc > downstream in the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7274) Introduce ANALYZE TABLE statements
[ https://issues.apache.org/jira/browse/DRILL-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7274: Fix Version/s: 1.17.0 > Introduce ANALYZE TABLE statements > -- > > Key: DRILL-7274 > URL: https://issues.apache.org/jira/browse/DRILL-7274 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7274) Introduce ANALYZE TABLE statements
Arina Ielchiieva created DRILL-7274: --- Summary: Introduce ANALYZE TABLE statements Key: DRILL-7274 URL: https://issues.apache.org/jira/browse/DRILL-7274 Project: Apache Drill Issue Type: Sub-task Reporter: Arina Ielchiieva Assignee: Volodymyr Vysotskyi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7272) Implement Drill Iceberg Metastore plugin
[ https://issues.apache.org/jira/browse/DRILL-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7272: Fix Version/s: 1.17.0 > Implement Drill Iceberg Metastore plugin > > > Key: DRILL-7272 > URL: https://issues.apache.org/jira/browse/DRILL-7272 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.17.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7273) Create operator for handling metadata
[ https://issues.apache.org/jira/browse/DRILL-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7273: Fix Version/s: 1.17.0 > Create operator for handling metadata > - > > Key: DRILL-7273 > URL: https://issues.apache.org/jira/browse/DRILL-7273 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7273) Create operator for handling metadata
Arina Ielchiieva created DRILL-7273: --- Summary: Create operator for handling metadata Key: DRILL-7273 URL: https://issues.apache.org/jira/browse/DRILL-7273 Project: Apache Drill Issue Type: Sub-task Reporter: Arina Ielchiieva Assignee: Volodymyr Vysotskyi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7272) Implement Drill Iceberg Metastore plugin
Arina Ielchiieva created DRILL-7272: --- Summary: Implement Drill Iceberg Metastore plugin Key: DRILL-7272 URL: https://issues.apache.org/jira/browse/DRILL-7272 Project: Apache Drill Issue Type: Sub-task Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
Arina Ielchiieva created DRILL-7271: --- Summary: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore Key: DRILL-7271 URL: https://issues.apache.org/jira/browse/DRILL-7271 Project: Apache Drill Issue Type: Sub-task Reporter: Arina Ielchiieva Assignee: Volodymyr Vysotskyi 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} org.apache.drill.metastore.RowGroupMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class fields to modify private final Map tableStatistics; private final Map statisticsKinds; {noformat} 8. Remove org.apache.drill.exec package from metastore module. 9. Rename ColumnStatisticsImpl class. 10. Separate existing classes in org.apache.drill.metastore package into sub-packages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7271: Fix Version/s: 1.17.0 > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: DIRECTORY. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6742) Research and investigate a way for collecting and storing table statistics in the scope of metastore integration
[ https://issues.apache.org/jira/browse/DRILL-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6742: Fix Version/s: 1.16.0 > Research and investigate a way for collecting and storing table statistics in > the scope of metastore integration > > > Key: DRILL-6742 > URL: https://issues.apache.org/jira/browse/DRILL-6742 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > In the scope of DRILL-1328 was made a significant work for collecting and > processing table statistics. The main show stopper for this Jira is a way of > storing collected statistics. > The aim of this Jira is to investigate how this statistics may be stored in > the metastore for different table types, a way how it may be additionally > collected and received from the metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6742) Research and investigate a way for collecting and storing table statistics in the scope of metastore integration
[ https://issues.apache.org/jira/browse/DRILL-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-6742. - Resolution: Fixed > Research and investigate a way for collecting and storing table statistics in > the scope of metastore integration > > > Key: DRILL-6742 > URL: https://issues.apache.org/jira/browse/DRILL-6742 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > > In the scope of DRILL-1328 was made a significant work for collecting and > processing table statistics. The main show stopper for this Jira is a way of > storing collected statistics. > The aim of this Jira is to investigate how this statistics may be stored in > the metastore for different table types, a way how it may be additionally > collected and received from the metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6552) Drill Metadata management "Drill MetaStore"
[ https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-6552: --- Assignee: Volodymyr Vysotskyi (was: Vitalii Diravka) > Drill Metadata management "Drill MetaStore" > --- > > Key: DRILL-6552 > URL: https://issues.apache.org/jira/browse/DRILL-6552 > Project: Apache Drill > Issue Type: New Feature > Components: Metadata >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 2.0.0 > > > It would be useful for Drill to have some sort of metastore which would > enable Drill to remember previously defined schemata so Drill doesn’t have to > do the same work over and over again. > It allows to store schema and statistics, which will allow to accelerate > queries validation, planning and execution time. Also it increases stability > of Drill and allows to avoid different kind if issues: "schema change > Exceptions", "limit 0" optimization and so on. > One of the main candidates is Hive Metastore. > Starting from 3.0 version Hive Metastore can be the separate service from > Hive server: > [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration] > Optional enhancement is storing Drill's profiles, UDFs, plugins configs in > some kind of metastore as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6552) Drill Metadata management "Drill MetaStore"
[ https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6552: Fix Version/s: (was: 2.0.0) 1.17.0 > Drill Metadata management "Drill MetaStore" > --- > > Key: DRILL-6552 > URL: https://issues.apache.org/jira/browse/DRILL-6552 > Project: Apache Drill > Issue Type: New Feature > Components: Metadata >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > It would be useful for Drill to have some sort of metastore which would > enable Drill to remember previously defined schemata so Drill doesn’t have to > do the same work over and over again. > It allows to store schema and statistics, which will allow to accelerate > queries validation, planning and execution time. Also it increases stability > of Drill and allows to avoid different kind if issues: "schema change > Exceptions", "limit 0" optimization and so on. > One of the main candidates is Hive Metastore. > Starting from 3.0 version Hive Metastore can be the separate service from > Hive server: > [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration] > Optional enhancement is storing Drill's profiles, UDFs, plugins configs in > some kind of metastore as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-5903) Regression: Query encounters "Waited for 15000ms, but tasks for 'Fetch parquet metadata' are not complete."
[ https://issues.apache.org/jira/browse/DRILL-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-5903. - Resolution: Invalid > Regression: Query encounters "Waited for 15000ms, but tasks for 'Fetch > parquet metadata' are not complete." > --- > > Key: DRILL-5903 > URL: https://issues.apache.org/jira/browse/DRILL-5903 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Storage - Parquet >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Critical > Attachments: 26122f83-6956-5aa8-d8de-d4808f572160.sys.drill, > drillbit.log > > > This is a query from the Functional-Baseline-100.171 run. The test is > /root/drillAutomation/mapr/framework/resources/Functional/parquet_storage/parquet_date/mc_parquet_date/generic/mixed1_partitioned5.q. > Query is: > {noformat} > select a.int_col, b.date_col from > dfs.`/drill/testdata/parquet_date/metadata_cache/mixed/fewtypes_null_large` a > inner join ( select date_col, int_col from > dfs.`/drill/testdata/parquet_date/metadata_cache/mixed/fewtypes_null_large` > where dir0 = '1.2' and date_col > '1996-03-07' ) b on cast(a.date_col as > date)= date_add(b.date_col, 5) where a.int_col = 7 and a.dir0='1.9' group by > a.int_col, b.date_col > {noformat} > From drillbit.log: > {noformat} > fc65-d430-ac1103638113: SELECT SUM(col_int) OVER() sum_int FROM > vwOnParq_wCst_35 > 2017-10-23 11:20:50,122 [26122f83-6956-5aa8-d8de-d4808f572160:foreman] ERROR > o.a.d.exec.store.parquet.Metadata - Waited for 15000ms, but tasks for 'Fetch > parquet metadata' are not complete. Total runnable size 3, parallelism 3. > 2017-10-23 11:20:50,127 [26122f83-6956-5aa8-d8de-d4808f572160:foreman] INFO > o.a.d.exec.store.parquet.Metadata - User Error Occurred: Waited for 15000ms, > but tasks for 'Fetch parquet metadata' are not complete. Total runnable size > 3, parallelism 3. > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: Waited for > 15000ms, but tasks for 'Fetch parquet metadata' are not complete. Total > runnable size 3, parallelism 3. > [Error Id: 7484e127-ea41-4797-83c0-6619ea9b2bcd ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:151) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:341) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:318) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:142) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:934) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:227) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:190) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:170) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:66) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:144) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:100) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:62) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > [calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811) > [calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22] > at >
[jira] [Commented] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
[ https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844811#comment-16844811 ] Arina Ielchiieva commented on DRILL-7091: - We should definitely fix this in 1.17. > Query with EXISTS and correlated subquery fails with NPE in > HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl > -- > > Key: DRILL-7091 > URL: https://issues.apache.org/jira/browse/DRILL-7091 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Volodymyr Vysotskyi >Assignee: Boaz Ben-Zvi >Priority: Major > Fix For: 1.17.0 > > > Steps to reproduce: > 1. Create view: > {code:sql} > create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`; > {code} > Run the following query: > {code:sql} > SELECT n_nationkey, n_name > FROM dfs.tmp.nation_view a > WHERE EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey) > {code} > This query fails with NPE: > {noformat} > [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) >
[jira] [Updated] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
[ https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7091: Fix Version/s: 1.17.0 > Query with EXISTS and correlated subquery fails with NPE in > HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl > -- > > Key: DRILL-7091 > URL: https://issues.apache.org/jira/browse/DRILL-7091 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Volodymyr Vysotskyi >Assignee: Boaz Ben-Zvi >Priority: Major > Fix For: 1.17.0 > > > Steps to reproduce: > 1. Create view: > {code:sql} > create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`; > {code} > Run the following query: > {code:sql} > SELECT n_nationkey, n_name > FROM dfs.tmp.nation_view a > WHERE EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey) > {code} > This query fails with NPE: > {noformat} > [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at >
[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks
[ https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7270: Component/s: Security > Fix non-https dependency urls and add checksum checks > - > > Key: DRILL-7270 > URL: https://issues.apache.org/jira/browse/DRILL-7270 > Project: Apache Drill > Issue Type: Task > Components: Security >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Dmytriy Grinchenko >Priority: Major > Fix For: 1.17.0 > > > Review any build scripts and configurations for insecure urls and make > appropriate fixes to use secure urls. > Projects like Lucene do checksum whitelists of all their build dependencies, > and you may wish to consider that as a > protection against threats beyond just MITM. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7270) Fix non-https dependency urls
[ https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7270: Description: Review any build scripts and configurations for insecure urls and make appropriate fixes to use secure urls. Projects like Lucene do checksum whitelists of all their build dependencies, and you may wish to consider that as a protection against threats beyond just MITM. was:Review any build scripts and configurations for insecure urls and make appropriate fixes to use secure urls. > Fix non-https dependency urls > - > > Key: DRILL-7270 > URL: https://issues.apache.org/jira/browse/DRILL-7270 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Dmytriy Grinchenko >Priority: Major > Fix For: 1.17.0 > > > Review any build scripts and configurations for insecure urls and make > appropriate fixes to use secure urls. > Projects like Lucene do checksum whitelists of all their build dependencies, > and you may wish to consider that as a > protection against threats beyond just MITM. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks
[ https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7270: Summary: Fix non-https dependency urls and add checksum checks (was: Fix non-https dependency urls) > Fix non-https dependency urls and add checksum checks > - > > Key: DRILL-7270 > URL: https://issues.apache.org/jira/browse/DRILL-7270 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Dmytriy Grinchenko >Priority: Major > Fix For: 1.17.0 > > > Review any build scripts and configurations for insecure urls and make > appropriate fixes to use secure urls. > Projects like Lucene do checksum whitelists of all their build dependencies, > and you may wish to consider that as a > protection against threats beyond just MITM. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7270) Fix non-https dependency urls
[ https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7270: Reviewer: Volodymyr Vysotskyi > Fix non-https dependency urls > - > > Key: DRILL-7270 > URL: https://issues.apache.org/jira/browse/DRILL-7270 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Dmytriy Grinchenko >Priority: Major > Fix For: 1.17.0 > > > Review any build scripts and configurations for insecure urls and make > appropriate fixes to use secure urls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7270) Fix non-https dependency urls
Arina Ielchiieva created DRILL-7270: --- Summary: Fix non-https dependency urls Key: DRILL-7270 URL: https://issues.apache.org/jira/browse/DRILL-7270 Project: Apache Drill Issue Type: Task Affects Versions: 1.16.0 Reporter: Arina Ielchiieva Assignee: Dmytriy Grinchenko Fix For: 1.17.0 Review any build scripts and configurations for insecure urls and make appropriate fixes to use secure urls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Priority: Minor (was: Major) > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Minor > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Attachment: (was: sharded_mongo.log) > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Major > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Attachment: (was: single_mongo.log) > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Major > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Description: The problem hiding in {{MongoTestSuit}} and the way how the data r bootstrapped on distributed cluster start up. It looks like not all shard come online, while we already starting to upload test data sets to DB. Below is an comparison of data between sharded cluster and single-mode: {code:title=sharded} #: full_name 0: "Mary Pierson" 1: "John Reed" 2: "Lynn Kwiatkowski" 3: "Donald Vann" 4: "Judy Owens" 5: "Lori Lightfoot" {code} {code:title=single} #: full_name 0: "Steve Eurich" 1: "Mary Pierson" 2: "John Reed" 3: "Lynn Kwiatkowski" 4: "Donald Vann" 5: "Judy Owens" 6: "Lori Lightfoot" 7: "Kumar" 8: "Kamesh" {code} was: The problem hiding in {{MongoTestSuit}} and the way how the data r bootstrapped on distributed cluster start up. It looks like not all shard come online, while we already starting to upload test data sets to DB. The bug start appear after fixes done in DRILL-7196, where we have deployed a sharded cluster but have used it as a single-node. Below is an comparison of data between sharded cluster and single-mode: {code:title=sharded} #: full_name 0: "Mary Pierson" 1: "John Reed" 2: "Lynn Kwiatkowski" 3: "Donald Vann" 4: "Judy Owens" 5: "Lori Lightfoot" {code} {code:title=single} #: full_name 0: "Steve Eurich" 1: "Mary Pierson" 2: "John Reed" 3: "Lynn Kwiatkowski" 4: "Donald Vann" 5: "Judy Owens" 6: "Lori Lightfoot" 7: "Kumar" 8: "Kamesh" {code} The Mongo server startup logs attached respectively. > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Major > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Component/s: (was: Tools, Build & Test) > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Major > Attachments: sharded_mongo.log, single_mongo.log > > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > The bug start appear after fixes done in DRILL-7196, where we have deployed a > sharded cluster but have used it as a single-node. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > The Mongo server startup logs attached respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Fix Version/s: (was: 1.17.0) > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build Test >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Major > Attachments: sharded_mongo.log, single_mongo.log > > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > The bug start appear after fixes done in DRILL-7196, where we have deployed a > sharded cluster but have used it as a single-node. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > The Mongo server startup logs attached respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5983) Unsupported nullable converted type INT_8 for primitive type INT32 error
[ https://issues.apache.org/jira/browse/DRILL-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844669#comment-16844669 ] Volodymyr Vysotskyi commented on DRILL-5983: [~vezir], [~davlee1...@yahoo.com], could you please share parquet files for which Drill fails, since I tried files from the DRILL-4764, and on Drill 1.16.0 everything works fine. > Unsupported nullable converted type INT_8 for primitive type INT32 error > > > Key: DRILL-5983 > URL: https://issues.apache.org/jira/browse/DRILL-5983 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.10.0, 1.11.0 > Environment: NAME="Ubuntu" > VERSION="16.04.2 LTS (Xenial Xerus)" >Reporter: Hakan Sarıbıyık >Priority: Major > Labels: parquet, read, types > > When I query a table with byte in it, then it gives an error; > _Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > ExecutionSetupException: Unsupported nullable converted type INT_8 for > primitive type INT32 Fragment 1:6 [Error Id: > 46636b05-cff5-455b-ba25-527217346b3e on bigdata7:31010]_ > Actualy, it has been solved with > [DRILL-4764] - Parquet file with INT_16, etc. logical types not supported by > simple SELECT > according to https://drill.apache.org/docs/apache-drill-1-10-0-release-notes/ > But i tried it with even 1-11-0 it didnt worked. > I am querying parquet formatted file with pySpark > tablo1 > sourceid: byte (nullable = true) > select sourceid from tablo1 > works as expected with pySpark. But not with Drill v1.11.0 > Thanx. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
[ https://issues.apache.org/jira/browse/DRILL-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytriy Grinchenko updated DRILL-7269: -- Attachment: single_mongo.log sharded_mongo.log > Mongo Unit-tests not able to import properly the test data when running in > sharded mode > --- > > Key: DRILL-7269 > URL: https://issues.apache.org/jira/browse/DRILL-7269 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build Test >Affects Versions: 1.17.0 >Reporter: Dmytriy Grinchenko >Assignee: Dmytriy Grinchenko >Priority: Major > Fix For: 1.17.0 > > Attachments: sharded_mongo.log, single_mongo.log > > > The problem hiding in {{MongoTestSuit}} and the way how the data r > bootstrapped on distributed cluster start up. It looks like not all shard > come online, while we already starting to upload test data sets to DB. > The bug start appear after fixes done in DRILL-7196, where we have deployed a > sharded cluster but have used it as a single-node. > Below is an comparison of data between sharded cluster and single-mode: > {code:title=sharded} > #: full_name > 0: "Mary Pierson" > 1: "John Reed" > 2: "Lynn Kwiatkowski" > 3: "Donald Vann" > 4: "Judy Owens" > 5: "Lori Lightfoot" > {code} > {code:title=single} > #: full_name > 0: "Steve Eurich" > 1: "Mary Pierson" > 2: "John Reed" > 3: "Lynn Kwiatkowski" > 4: "Donald Vann" > 5: "Judy Owens" > 6: "Lori Lightfoot" > 7: "Kumar" > 8: "Kamesh" > {code} > The Mongo server startup logs attached respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7269) Mongo Unit-tests not able to import properly the test data when running in sharded mode
Dmytriy Grinchenko created DRILL-7269: - Summary: Mongo Unit-tests not able to import properly the test data when running in sharded mode Key: DRILL-7269 URL: https://issues.apache.org/jira/browse/DRILL-7269 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Affects Versions: 1.17.0 Reporter: Dmytriy Grinchenko Assignee: Dmytriy Grinchenko Fix For: 1.17.0 The problem hiding in {{MongoTestSuit}} and the way how the data r bootstrapped on distributed cluster start up. It looks like not all shard come online, while we already starting to upload test data sets to DB. The bug start appear after fixes done in DRILL-7196, where we have deployed a sharded cluster but have used it as a single-node. Below is an comparison of data between sharded cluster and single-mode: {code:title=sharded} #: full_name 0: "Mary Pierson" 1: "John Reed" 2: "Lynn Kwiatkowski" 3: "Donald Vann" 4: "Judy Owens" 5: "Lori Lightfoot" {code} {code:title=single} #: full_name 0: "Steve Eurich" 1: "Mary Pierson" 2: "John Reed" 3: "Lynn Kwiatkowski" 4: "Donald Vann" 5: "Judy Owens" 6: "Lori Lightfoot" 7: "Kumar" 8: "Kamesh" {code} The Mongo server startup logs attached respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)