date:20190814

[GitHub] [drill] kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual row counts for a query

2019-08-14 Thread GitBox

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual 
row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-521408383
 
 
   @agozhiy I've done the changes requested. Please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7348) Aggregate on Subquery with Select Distinct or UNION fails to Group By

2019-08-14 Thread Keith G Yu (JIRA)

Keith G Yu created DRILL-7348:
-

 Summary: Aggregate on Subquery with Select Distinct or UNION fails 
to Group By
 Key: DRILL-7348
 URL: https://issues.apache.org/jira/browse/DRILL-7348
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.15.0
Reporter: Keith G Yu


The following query fails to group properly.
{code:java}
SELECT date, COUNT(1)
FROM (
SELECT DISTINCT
id,
date,
status
FROM table(dfs.`path`(type => 'text', fieldDelimiter => ',', extractHeader 
=> TRUE))
)
GROUP BY 1{code}
This also fails to group properly.
{code:java}
SELECT date, COUNT(1)
FROM (
SELECT
id,
date,
status
FROM table(dfs.`path1`(type => 'text', fieldDelimiter => ',', extractHeader 
=> TRUE))
UNION
SELECT
id,
date,
status
FROM table(dfs.`path2`(type => 'text', fieldDelimiter => ',', extractHeader 
=> TRUE))
)
GROUP BY 1
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-08-14 Thread Arina Ielchiieva (JIRA)

Arina Ielchiieva created DRILL-7347:
---

 Summary: Upgrade Apache Iceberg to released version
 Key: DRILL-7347
 URL: https://issues.apache.org/jira/browse/DRILL-7347
 Project: Apache Drill
  Issue Type: Task
Reporter: Arina Ielchiieva


Currently Drill uses Apache Iceberg build on certain commit using JitPack since 
there is no official released version. Once Iceberg first version is released, 
we need to use officially released version instead of commit.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[GitHub] [drill] arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521263751
 
 
   @oleg-zinovev thanks for making the changes, a couple of minor comments are 
left...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313897281
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,72 +18,142 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
-  @Test // see DRILL-2408
+  @Test
   public void testWriteEmptyFile() throws Exception {
 final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfile";
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
-Assert.assertFalse(outputFile.exists());
+assertTrue(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
+assertFalse(outputFile.exists());
+  }
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+  @Test
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
+
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
+
+// Make sure that only 1 parquet file was created
+assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
   }
 
-  @Test // see DRILL-2408
+  @Test
+  public void testSimpleEmptyFileSchema() throws Exception {
 
 Review comment:
   Also we need to add test where we select from non-empty Parquet file but 
filter condition eliminates all rows, similar as you have for JSON.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313896738
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,72 +18,142 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
-  @Test // see DRILL-2408
+  @Test
   public void testWriteEmptyFile() throws Exception {
 final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfile";
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
-Assert.assertFalse(outputFile.exists());
+assertTrue(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
+assertFalse(outputFile.exists());
+  }
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+  @Test
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
+
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
+
+// Make sure that only 1 parquet file was created
+assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
   }
 
-  @Test // see DRILL-2408
+  @Test
+  public void testSimpleEmptyFileSchema() throws Exception {
 
 Review comment:
   This test is redundant since `testComplexEmptyFileSchema` both cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313895815
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -122,6 +122,9 @@
   private PrimitiveTypeName logicalTypeForDecimals;
   private boolean usePrimitiveTypesForDecimals;
 
+  /** Whether no rows was written. */
 
 Review comment:
   Is used to ensure that empty Parquet file will be written if no rows were 
provided.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support 
empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313831543
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,23 +18,33 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   My bad. Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313829664
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,23 +18,33 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   Please remove `see DRILL-2408`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521208818
 
 
   @arina-ielchiieva , thanks for review. Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support 
empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite tests using RowSet. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support 
empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite tests using RowSet. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support 
empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite texts using RowSet. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521191646
 
 
   > @arina-ielchiieva
   > Some time spent debugging the test showed that the last schema contains 
all fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput.
   > In the original version of the test, field A was not added due to plan 
optimization - condition `1=0` was replaced by` limit 0`
   > 
   > I can still provide a solution with combining schema if required.
   
   This case we don't need schema merge,


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800825
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -310,27 +312,26 @@ public void checkForNewPartition(int index) {
 try {
   boolean newPartition = newPartition(index);
   if (newPartition) {
-flush();
+flush(false);
 newSchema();
   }
 } catch (Exception e) {
   throw new DrillRuntimeException(e);
 }
   }
 
-  private void flush() throws IOException {
+  private void flush(final boolean cleanUp) throws IOException {
 
 Review comment:
   ```suggestion
 private void flush(boolean cleanUp) throws IOException {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313806015
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -122,6 +122,8 @@
   private PrimitiveTypeName logicalTypeForDecimals;
   private boolean usePrimitiveTypesForDecimals;
 
+  private boolean empty = true;
 
 Review comment:
   Please add comment describing the purpose of this flag.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313803449
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+
+// end_date column is null, so it missing in result schema.
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+.addNullable("employee_id", TypeProtos.MinorType.BIGINT)
+.addNullable("full_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("first_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("last_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("position_id", TypeProtos.MinorType.BIGINT)
+.addNullable("position_title", TypeProtos.MinorType.VARCHAR)
+.addNullable("store_id", TypeProtos.MinorType.BIGINT)
+.addNullable("department_id", TypeProtos.MinorType.BIGINT)
+.addNullable("birth_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("hire_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("salary", TypeProtos.MinorType.FLOAT8)
+.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT)
+.addNullable("education_level", TypeProtos.MinorType.VARCHAR)
+.addNullable("marital_status", TypeProtos.MinorType.VARCHAR)
+.addNullable("gender", TypeProtos.MinorType.VARCHAR)
+.addNullable("management_role", TypeProtos.MinorType.VARCHAR);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+.withSchemaBuilder(schemaBuilder)
+.build();
+
+testBuilder()
+.unOrdered()
+.sqlQuery("select * from dfs.tmp.%s", outputFileName)
+.schemaBaseLine(expectedSchema)
+.go();
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   Please remove `see DRILL-2408` references in the class, now the make no 
sense.


This is an automated message from the Apache Git Service.
To respond to the message

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313805760
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   Please replace select from JSON file with select from Parquet, we need to 
test that schema is created correctly from already known schema.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800742
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -486,11 +467,54 @@ public void abort() throws IOException {
 
   @Override
   public void cleanup() throws IOException {
-flush();
+flush(true);
 
 codecFactory.release();
   }
 
+  private void createParquetFileWriter() throws IOException {
+assert parquetFileWriter == null;
 
 Review comment:
   Please remove, there is no need for this check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800525
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -486,11 +467,54 @@ public void abort() throws IOException {
 
   @Override
   public void cleanup() throws IOException {
-flush();
+flush(true);
 
 codecFactory.release();
   }
 
+  private void createParquetFileWriter() throws IOException {
+assert parquetFileWriter == null;
+
+Path path = new Path(location, prefix + "_" + index + ".parquet");
+// to ensure that our writer was the first to create output file, we 
create empty file first and fail if file exists
+Path firstCreatedPath = storageStrategy.createFileAndApply(fs, path);
+
+// since parquet reader supports partitions, it means that several output 
files may be created
+// if this writer was the one to create table folder, we store only folder 
and delete it with its content in case of abort
+// if table location was created before, we store only files created by 
this writer and delete them in case of abort
+addCleanUpLocation(fs, firstCreatedPath);
+
+// since ParquetFileWriter will overwrite empty output file (append is not 
supported)
+// we need to re-apply file permission
+if (useSingleFSBlock) {
+  // Passing blockSize creates files with this blockSize instead of 
filesystem default blockSize.
+  // Currently, this is supported only by filesystems included in
+  // BLOCK_FS_SCHEMES (ParquetFileWriter.java in parquet-mr), which 
includes HDFS.
+  // For other filesystems, it uses default blockSize configured for the 
file system.
+  parquetFileWriter = new ParquetFileWriter(conf, schema, path, 
ParquetFileWriter.Mode.OVERWRITE, blockSize, 0);
+} else {
+  parquetFileWriter = new ParquetFileWriter(conf, schema, path, 
ParquetFileWriter.Mode.OVERWRITE);
+}
+storageStrategy.applyToFile(fs, path);
+parquetFileWriter.start();
+  }
+
+  private void flushParquetFileWriter() throws IOException {
+assert parquetFileWriter != null;
 
 Review comment:
   Same here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313803238
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+
+// end_date column is null, so it missing in result schema.
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+.addNullable("employee_id", TypeProtos.MinorType.BIGINT)
+.addNullable("full_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("first_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("last_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("position_id", TypeProtos.MinorType.BIGINT)
+.addNullable("position_title", TypeProtos.MinorType.VARCHAR)
+.addNullable("store_id", TypeProtos.MinorType.BIGINT)
+.addNullable("department_id", TypeProtos.MinorType.BIGINT)
+.addNullable("birth_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("hire_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("salary", TypeProtos.MinorType.FLOAT8)
+.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT)
+.addNullable("education_level", TypeProtos.MinorType.VARCHAR)
+.addNullable("marital_status", TypeProtos.MinorType.VARCHAR)
+.addNullable("gender", TypeProtos.MinorType.VARCHAR)
+.addNullable("management_role", TypeProtos.MinorType.VARCHAR);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+.withSchemaBuilder(schemaBuilder)
+.build();
+
+testBuilder()
+.unOrdered()
+.sqlQuery("select * from dfs.tmp.%s", outputFileName)
+.schemaBaseLine(expectedSchema)
+.go();
   }
 
   @Test // see DRILL-2408
   public void testWriteEmptyFileAfterFlush() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_test_write_empty_file_after_flush";
+final String outputFileName = 
"testparquetwriteremptyfiles_test_write_empty_file_af

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: 
Support empty Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313804193
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
 
 Review comment:
   Static import.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation

2019-08-14 Thread GitBox

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521174404
 
 
   @arina-ielchiieva 
   Some time spent debugging the test showed that the last schema contains all 
fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput. 
   In the original version of the test, field A was not added due to plan 
optimization - condition `1=0` was replaced by` limit 0`
   
   I can still provide a solution with combining schema if required.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual row counts for a query

[jira] [Created] (DRILL-7348) Aggregate on Subquery with Select Distinct or UNION fails to Group By

[jira] [Created] (DRILL-7347) Upgrade Apache Iceberg to released version

[GitHub] [drill] arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] oleg-zinovev commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1836: DRILL-7156: Support empty Parquet files creation

[GitHub] [drill] oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files creation

23 matches

Site Navigation

Mail list logo

Footer information