[GitHub] [drill] kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual row counts for a query
kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual row counts for a query URL: https://github.com/apache/drill/pull/1779#issuecomment-521408383 @agozhiy I've done the changes requested. Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7348) Aggregate on Subquery with Select Distinct or UNION fails to Group By
Keith G Yu created DRILL-7348: - Summary: Aggregate on Subquery with Select Distinct or UNION fails to Group By Key: DRILL-7348 URL: https://issues.apache.org/jira/browse/DRILL-7348 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.15.0 Reporter: Keith G Yu The following query fails to group properly. {code:java} SELECT date, COUNT(1) FROM ( SELECT DISTINCT id, date, status FROM table(dfs.`path`(type => 'text', fieldDelimiter => ',', extractHeader => TRUE)) ) GROUP BY 1{code} This also fails to group properly. {code:java} SELECT date, COUNT(1) FROM ( SELECT id, date, status FROM table(dfs.`path1`(type => 'text', fieldDelimiter => ',', extractHeader => TRUE)) UNION SELECT id, date, status FROM table(dfs.`path2`(type => 'text', fieldDelimiter => ',', extractHeader => TRUE)) ) GROUP BY 1 {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (DRILL-7347) Upgrade Apache Iceberg to released version
Arina Ielchiieva created DRILL-7347: --- Summary: Upgrade Apache Iceberg to released version Key: DRILL-7347 URL: https://issues.apache.org/jira/browse/DRILL-7347 Project: Apache Drill Issue Type: Task Reporter: Arina Ielchiieva Currently Drill uses Apache Iceberg build on certain commit using JitPack since there is no official released version. Once Iceberg first version is released, we need to use officially released version instead of commit. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[GitHub] [drill] arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#issuecomment-521263751 @oleg-zinovev thanks for making the changes, a couple of minor comments are left... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313897281 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -18,72 +18,142 @@ package org.apache.drill.exec.physical.impl.writer; import org.apache.commons.io.FileUtils; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.BatchSchemaBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.categories.ParquetTest; import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.exec.ExecConstants; -import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import org.junit.experimental.categories.Category; import java.io.File; +import java.nio.file.Paths; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; @Category({ParquetTest.class, UnlikelyTest.class}) public class TestParquetWriterEmptyFiles extends BaseTestQuery { @BeforeClass public static void initFs() throws Exception { updateTestCluster(3, null); +dirTestWatcher.copyResourceToRoot(Paths.get("schemachange")); +dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty")); } - @Test // see DRILL-2408 + @Test public void testWriteEmptyFile() throws Exception { final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfile"; final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); -Assert.assertFalse(outputFile.exists()); +assertTrue(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); +assertFalse(outputFile.exists()); + } -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; + @Test + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); + +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); + +// Make sure that only 1 parquet file was created +assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); } - @Test // see DRILL-2408 + @Test + public void testSimpleEmptyFileSchema() throws Exception { Review comment: Also we need to add test where we select from non-empty Parquet file but filter condition eliminates all rows, similar as you have for JSON. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313896738 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -18,72 +18,142 @@ package org.apache.drill.exec.physical.impl.writer; import org.apache.commons.io.FileUtils; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.BatchSchemaBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.categories.ParquetTest; import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.exec.ExecConstants; -import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import org.junit.experimental.categories.Category; import java.io.File; +import java.nio.file.Paths; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; @Category({ParquetTest.class, UnlikelyTest.class}) public class TestParquetWriterEmptyFiles extends BaseTestQuery { @BeforeClass public static void initFs() throws Exception { updateTestCluster(3, null); +dirTestWatcher.copyResourceToRoot(Paths.get("schemachange")); +dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty")); } - @Test // see DRILL-2408 + @Test public void testWriteEmptyFile() throws Exception { final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfile"; final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); -Assert.assertFalse(outputFile.exists()); +assertTrue(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); +assertFalse(outputFile.exists()); + } -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; + @Test + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); + +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); + +// Make sure that only 1 parquet file was created +assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); } - @Test // see DRILL-2408 + @Test + public void testSimpleEmptyFileSchema() throws Exception { Review comment: This test is redundant since `testComplexEmptyFileSchema` both cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313895815 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java ## @@ -122,6 +122,9 @@ private PrimitiveTypeName logicalTypeForDecimals; private boolean usePrimitiveTypesForDecimals; + /** Whether no rows was written. */ Review comment: Is used to ensure that empty Parquet file will be written if no rows were provided. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313831543 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -18,23 +18,33 @@ package org.apache.drill.exec.physical.impl.writer; import org.apache.commons.io.FileUtils; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.BatchSchemaBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.categories.ParquetTest; import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.exec.ExecConstants; -import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import org.junit.experimental.categories.Category; import java.io.File; +import java.nio.file.Paths; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; @Category({ParquetTest.class, UnlikelyTest.class}) public class TestParquetWriterEmptyFiles extends BaseTestQuery { @BeforeClass public static void initFs() throws Exception { updateTestCluster(3, null); +dirTestWatcher.copyResourceToRoot(Paths.get("schemachange")); +dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty")); } @Test // see DRILL-2408 Review comment: My bad. Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313829664 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -18,23 +18,33 @@ package org.apache.drill.exec.physical.impl.writer; import org.apache.commons.io.FileUtils; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.BatchSchemaBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.categories.ParquetTest; import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.exec.ExecConstants; -import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import org.junit.experimental.categories.Category; import java.io.File; +import java.nio.file.Paths; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; @Category({ParquetTest.class, UnlikelyTest.class}) public class TestParquetWriterEmptyFiles extends BaseTestQuery { @BeforeClass public static void initFs() throws Exception { updateTestCluster(3, null); +dirTestWatcher.copyResourceToRoot(Paths.get("schemachange")); +dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty")); } @Test // see DRILL-2408 Review comment: Please remove `see DRILL-2408` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation
oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#issuecomment-521208818 @arina-ielchiieva , thanks for review. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313815186 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); + } + + @Test + public void testEmptyFileSchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testemptyfileschema"; + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); Review comment: If needed, I can also rewrite tests using RowSet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313815186 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); + } + + @Test + public void testEmptyFileSchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testemptyfileschema"; + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); Review comment: If needed, I can also rewrite tests using RowSet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313815186 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); + } + + @Test + public void testEmptyFileSchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testemptyfileschema"; + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); Review comment: If needed, I can also rewrite texts using RowSet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#issuecomment-521191646 > @arina-ielchiieva > Some time spent debugging the test showed that the last schema contains all fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput. > In the original version of the test, field A was not added due to plan optimization - condition `1=0` was replaced by` limit 0` > > I can still provide a solution with combining schema if required. This case we don't need schema merge, This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313800825 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java ## @@ -310,27 +312,26 @@ public void checkForNewPartition(int index) { try { boolean newPartition = newPartition(index); if (newPartition) { -flush(); +flush(false); newSchema(); } } catch (Exception e) { throw new DrillRuntimeException(e); } } - private void flush() throws IOException { + private void flush(final boolean cleanUp) throws IOException { Review comment: ```suggestion private void flush(boolean cleanUp) throws IOException { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313806015 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java ## @@ -122,6 +122,8 @@ private PrimitiveTypeName logicalTypeForDecimals; private boolean usePrimitiveTypesForDecimals; + private boolean empty = true; Review comment: Please add comment describing the purpose of this flag. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313803449 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); + } + + @Test + public void testEmptyFileSchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testemptyfileschema"; + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); + +// end_date column is null, so it missing in result schema. +SchemaBuilder schemaBuilder = new SchemaBuilder() +.addNullable("employee_id", TypeProtos.MinorType.BIGINT) +.addNullable("full_name", TypeProtos.MinorType.VARCHAR) +.addNullable("first_name", TypeProtos.MinorType.VARCHAR) +.addNullable("last_name", TypeProtos.MinorType.VARCHAR) +.addNullable("position_id", TypeProtos.MinorType.BIGINT) +.addNullable("position_title", TypeProtos.MinorType.VARCHAR) +.addNullable("store_id", TypeProtos.MinorType.BIGINT) +.addNullable("department_id", TypeProtos.MinorType.BIGINT) +.addNullable("birth_date", TypeProtos.MinorType.VARCHAR) +.addNullable("hire_date", TypeProtos.MinorType.VARCHAR) +.addNullable("salary", TypeProtos.MinorType.FLOAT8) +.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT) +.addNullable("education_level", TypeProtos.MinorType.VARCHAR) +.addNullable("marital_status", TypeProtos.MinorType.VARCHAR) +.addNullable("gender", TypeProtos.MinorType.VARCHAR) +.addNullable("management_role", TypeProtos.MinorType.VARCHAR); +BatchSchema expectedSchema = new BatchSchemaBuilder() +.withSchemaBuilder(schemaBuilder) +.build(); + +testBuilder() +.unOrdered() +.sqlQuery("select * from dfs.tmp.%s", outputFileName) +.schemaBaseLine(expectedSchema) +.go(); } @Test // see DRILL-2408 Review comment: Please remove `see DRILL-2408` references in the class, now the make no sense. This is an automated message from the Apache Git Service. To respond to the message
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313805760 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); + } + + @Test + public void testEmptyFileSchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testemptyfileschema"; + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); Review comment: Please replace select from JSON file with select from Parquet, we need to test that schema is created correctly from already known schema. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313800742 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java ## @@ -486,11 +467,54 @@ public void abort() throws IOException { @Override public void cleanup() throws IOException { -flush(); +flush(true); codecFactory.release(); } + private void createParquetFileWriter() throws IOException { +assert parquetFileWriter == null; Review comment: Please remove, there is no need for this check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313800525 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java ## @@ -486,11 +467,54 @@ public void abort() throws IOException { @Override public void cleanup() throws IOException { -flush(); +flush(true); codecFactory.release(); } + private void createParquetFileWriter() throws IOException { +assert parquetFileWriter == null; + +Path path = new Path(location, prefix + "_" + index + ".parquet"); +// to ensure that our writer was the first to create output file, we create empty file first and fail if file exists +Path firstCreatedPath = storageStrategy.createFileAndApply(fs, path); + +// since parquet reader supports partitions, it means that several output files may be created +// if this writer was the one to create table folder, we store only folder and delete it with its content in case of abort +// if table location was created before, we store only files created by this writer and delete them in case of abort +addCleanUpLocation(fs, firstCreatedPath); + +// since ParquetFileWriter will overwrite empty output file (append is not supported) +// we need to re-apply file permission +if (useSingleFSBlock) { + // Passing blockSize creates files with this blockSize instead of filesystem default blockSize. + // Currently, this is supported only by filesystems included in + // BLOCK_FS_SCHEMES (ParquetFileWriter.java in parquet-mr), which includes HDFS. + // For other filesystems, it uses default blockSize configured for the file system. + parquetFileWriter = new ParquetFileWriter(conf, schema, path, ParquetFileWriter.Mode.OVERWRITE, blockSize, 0); +} else { + parquetFileWriter = new ParquetFileWriter(conf, schema, path, ParquetFileWriter.Mode.OVERWRITE); +} +storageStrategy.applyToFile(fs, path); +parquetFileWriter.start(); + } + + private void flushParquetFileWriter() throws IOException { +assert parquetFileWriter != null; Review comment: Same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313803238 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); + } + + @Test + public void testEmptyFileSchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testemptyfileschema"; + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); + +// end_date column is null, so it missing in result schema. +SchemaBuilder schemaBuilder = new SchemaBuilder() +.addNullable("employee_id", TypeProtos.MinorType.BIGINT) +.addNullable("full_name", TypeProtos.MinorType.VARCHAR) +.addNullable("first_name", TypeProtos.MinorType.VARCHAR) +.addNullable("last_name", TypeProtos.MinorType.VARCHAR) +.addNullable("position_id", TypeProtos.MinorType.BIGINT) +.addNullable("position_title", TypeProtos.MinorType.VARCHAR) +.addNullable("store_id", TypeProtos.MinorType.BIGINT) +.addNullable("department_id", TypeProtos.MinorType.BIGINT) +.addNullable("birth_date", TypeProtos.MinorType.VARCHAR) +.addNullable("hire_date", TypeProtos.MinorType.VARCHAR) +.addNullable("salary", TypeProtos.MinorType.FLOAT8) +.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT) +.addNullable("education_level", TypeProtos.MinorType.VARCHAR) +.addNullable("marital_status", TypeProtos.MinorType.VARCHAR) +.addNullable("gender", TypeProtos.MinorType.VARCHAR) +.addNullable("management_role", TypeProtos.MinorType.VARCHAR); +BatchSchema expectedSchema = new BatchSchemaBuilder() +.withSchemaBuilder(schemaBuilder) +.build(); + +testBuilder() +.unOrdered() +.sqlQuery("select * from dfs.tmp.%s", outputFileName) +.schemaBaseLine(expectedSchema) +.go(); } @Test // see DRILL-2408 public void testWriteEmptyFileAfterFlush() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_test_write_empty_file_after_flush"; +final String outputFileName = "testparquetwriteremptyfiles_test_write_empty_file_af
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation
arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#discussion_r313804193 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java ## @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception { final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0", outputFileName); +Assert.assertTrue(outputFile.exists()); + } + + @Test + public void testWriteEmptyFileWithEmptySchema() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyfileemptyschema"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); + +test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", outputFileName); Assert.assertFalse(outputFile.exists()); } @Test - public void testMultipleWriters() throws Exception { -final String outputFile = "testparquetwriteremptyfiles_testmultiplewriters"; + public void testWriteEmptySchemaChange() throws Exception { +final String outputFileName = "testparquetwriteremptyfiles_testwriteemptyschemachange"; +final File outputFile = FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName); -runSQL("alter session set `planner.slice_target` = 1"); +test("CREATE TABLE dfs.tmp.%s AS select id, a, b from dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName); -try { - final String query = "SELECT position_id FROM cp.`employee.json` WHERE position_id IN (15, 16) GROUP BY position_id"; +// Only the last scan scheme is written +SchemaBuilder schemaBuilder = new SchemaBuilder() + .addNullable("id", TypeProtos.MinorType.BIGINT) + .addNullable("a", TypeProtos.MinorType.BIGINT) + .addNullable("b", TypeProtos.MinorType.BIT); +BatchSchema expectedSchema = new BatchSchemaBuilder() + .withSchemaBuilder(schemaBuilder) + .build(); - test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query); +testBuilder() + .unOrdered() + .sqlQuery("select * from dfs.tmp.%s", outputFileName) + .schemaBaseLine(expectedSchema) + .go(); - // this query will fail if an "empty" file was created - testBuilder() -.unOrdered() -.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile) -.sqlBaselineQuery(query) -.go(); -} finally { - runSQL("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_DEFAULT); -} +// Make sure that only 1 parquet file was created +Assert.assertEquals(1, outputFile.list((dir, name) -> name.endsWith("parquet")).length); Review comment: Static import. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation
oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation URL: https://github.com/apache/drill/pull/1836#issuecomment-521174404 @arina-ielchiieva Some time spent debugging the test showed that the last schema contains all fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput. In the original version of the test, field A was not added due to plan optimization - condition `1=0` was replaced by` limit 0` I can still provide a solution with combining schema if required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services