[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916583#comment-16916583
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

asfgit commented on pull request #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908018#comment-16908018
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521618404
 
 
   Ran all tests on the test cluster, all passed. LGTM, +1
   @oleg-zinovev thanks for making the changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907960#comment-16907960
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521581932
 
 
   @arina-ielchiieva , thanks for review. Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907298#comment-16907298
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521263751
 
 
   @oleg-zinovev thanks for making the changes, a couple of minor comments are 
left...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907297#comment-16907297
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313897281
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,72 +18,142 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
-  @Test // see DRILL-2408
+  @Test
   public void testWriteEmptyFile() throws Exception {
 final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfile";
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
-Assert.assertFalse(outputFile.exists());
+assertTrue(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
+assertFalse(outputFile.exists());
+  }
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+  @Test
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
+
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
+
+// Make sure that only 1 parquet file was created
+assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
   }
 
-  @Test // see DRILL-2408
+  @Test
+  public void testSimpleEmptyFileSchema() throws Exception {
 
 Review comment:
   Also we need to add test where we select from non-empty Parquet file but 
filter condition eliminates all rows, similar as you have for JSON.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this 

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907295#comment-16907295
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313896738
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,72 +18,142 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
-  @Test // see DRILL-2408
+  @Test
   public void testWriteEmptyFile() throws Exception {
 final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfile";
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
-Assert.assertFalse(outputFile.exists());
+assertTrue(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
+assertFalse(outputFile.exists());
+  }
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+  @Test
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
+
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
+
+// Make sure that only 1 parquet file was created
+assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
   }
 
-  @Test // see DRILL-2408
+  @Test
+  public void testSimpleEmptyFileSchema() throws Exception {
 
 Review comment:
   This test is redundant since `testComplexEmptyFileSchema` both cases.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty 

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907293#comment-16907293
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313895815
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -122,6 +122,9 @@
   private PrimitiveTypeName logicalTypeForDecimals;
   private boolean usePrimitiveTypesForDecimals;
 
+  /** Whether no rows was written. */
 
 Review comment:
   Is used to ensure that empty Parquet file will be written if no rows were 
provided.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907195#comment-16907195
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313831543
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,23 +18,33 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   My bad. Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907190#comment-16907190
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313829664
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,23 +18,33 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   Please remove `see DRILL-2408`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907188#comment-16907188
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521208818
 
 
   @arina-ielchiieva , thanks for review. Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907184#comment-16907184
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite tests using RowSet. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907164#comment-16907164
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite tests using RowSet. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907163#comment-16907163
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite texts using RowSet. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907153#comment-16907153
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521191646
 
 
   > @arina-ielchiieva
   > Some time spent debugging the test showed that the last schema contains 
all fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput.
   > In the original version of the test, field A was not added due to plan 
optimization - condition `1=0` was replaced by` limit 0`
   > 
   > I can still provide a solution with combining schema if required.
   
   This case we don't need schema merge,
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907149#comment-16907149
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313806015
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -122,6 +122,8 @@
   private PrimitiveTypeName logicalTypeForDecimals;
   private boolean usePrimitiveTypesForDecimals;
 
+  private boolean empty = true;
 
 Review comment:
   Please add comment describing the purpose of this flag.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907150#comment-16907150
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313803238
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+
+// end_date column is null, so it missing in result schema.
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+.addNullable("employee_id", TypeProtos.MinorType.BIGINT)
+.addNullable("full_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("first_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("last_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("position_id", TypeProtos.MinorType.BIGINT)
+.addNullable("position_title", TypeProtos.MinorType.VARCHAR)
+.addNullable("store_id", TypeProtos.MinorType.BIGINT)
+.addNullable("department_id", TypeProtos.MinorType.BIGINT)
+.addNullable("birth_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("hire_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("salary", TypeProtos.MinorType.FLOAT8)
+.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT)
+.addNullable("education_level", TypeProtos.MinorType.VARCHAR)
+.addNullable("marital_status", TypeProtos.MinorType.VARCHAR)
+.addNullable("gender", TypeProtos.MinorType.VARCHAR)
+.addNullable("management_role", TypeProtos.MinorType.VARCHAR);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+.withSchemaBuilder(schemaBuilder)
+.build();
+
+testBuilder()
+.unOrdered()
+.sqlQuery("select * from dfs.tmp.%s", outputFileName)
+.schemaBaseLine(expectedSchema)
+.go();
   }
 
   @Test // see DRILL-2408
   public void 

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907148#comment-16907148
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313803449
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+
+// end_date column is null, so it missing in result schema.
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+.addNullable("employee_id", TypeProtos.MinorType.BIGINT)
+.addNullable("full_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("first_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("last_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("position_id", TypeProtos.MinorType.BIGINT)
+.addNullable("position_title", TypeProtos.MinorType.VARCHAR)
+.addNullable("store_id", TypeProtos.MinorType.BIGINT)
+.addNullable("department_id", TypeProtos.MinorType.BIGINT)
+.addNullable("birth_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("hire_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("salary", TypeProtos.MinorType.FLOAT8)
+.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT)
+.addNullable("education_level", TypeProtos.MinorType.VARCHAR)
+.addNullable("marital_status", TypeProtos.MinorType.VARCHAR)
+.addNullable("gender", TypeProtos.MinorType.VARCHAR)
+.addNullable("management_role", TypeProtos.MinorType.VARCHAR);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+.withSchemaBuilder(schemaBuilder)
+.build();
+
+testBuilder()
+.unOrdered()
+.sqlQuery("select * from dfs.tmp.%s", outputFileName)
+.schemaBaseLine(expectedSchema)
+.go();
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907147#comment-16907147
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800525
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -486,11 +467,54 @@ public void abort() throws IOException {
 
   @Override
   public void cleanup() throws IOException {
-flush();
+flush(true);
 
 codecFactory.release();
   }
 
+  private void createParquetFileWriter() throws IOException {
+assert parquetFileWriter == null;
+
+Path path = new Path(location, prefix + "_" + index + ".parquet");
+// to ensure that our writer was the first to create output file, we 
create empty file first and fail if file exists
+Path firstCreatedPath = storageStrategy.createFileAndApply(fs, path);
+
+// since parquet reader supports partitions, it means that several output 
files may be created
+// if this writer was the one to create table folder, we store only folder 
and delete it with its content in case of abort
+// if table location was created before, we store only files created by 
this writer and delete them in case of abort
+addCleanUpLocation(fs, firstCreatedPath);
+
+// since ParquetFileWriter will overwrite empty output file (append is not 
supported)
+// we need to re-apply file permission
+if (useSingleFSBlock) {
+  // Passing blockSize creates files with this blockSize instead of 
filesystem default blockSize.
+  // Currently, this is supported only by filesystems included in
+  // BLOCK_FS_SCHEMES (ParquetFileWriter.java in parquet-mr), which 
includes HDFS.
+  // For other filesystems, it uses default blockSize configured for the 
file system.
+  parquetFileWriter = new ParquetFileWriter(conf, schema, path, 
ParquetFileWriter.Mode.OVERWRITE, blockSize, 0);
+} else {
+  parquetFileWriter = new ParquetFileWriter(conf, schema, path, 
ParquetFileWriter.Mode.OVERWRITE);
+}
+storageStrategy.applyToFile(fs, path);
+parquetFileWriter.start();
+  }
+
+  private void flushParquetFileWriter() throws IOException {
+assert parquetFileWriter != null;
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907145#comment-16907145
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800825
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -310,27 +312,26 @@ public void checkForNewPartition(int index) {
 try {
   boolean newPartition = newPartition(index);
   if (newPartition) {
-flush();
+flush(false);
 newSchema();
   }
 } catch (Exception e) {
   throw new DrillRuntimeException(e);
 }
   }
 
-  private void flush() throws IOException {
+  private void flush(final boolean cleanUp) throws IOException {
 
 Review comment:
   ```suggestion
 private void flush(boolean cleanUp) throws IOException {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907151#comment-16907151
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313805760
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   Please replace select from JSON file with select from Parquet, we need to 
test that schema is created correctly from already known schema.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907144#comment-16907144
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800742
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -486,11 +467,54 @@ public void abort() throws IOException {
 
   @Override
   public void cleanup() throws IOException {
-flush();
+flush(true);
 
 codecFactory.release();
   }
 
+  private void createParquetFileWriter() throws IOException {
+assert parquetFileWriter == null;
 
 Review comment:
   Please remove, there is no need for this check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907146#comment-16907146
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313804193
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
 
 Review comment:
   Static import.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907085#comment-16907085
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521174404
 
 
   @arina-ielchiieva 
   Some time spent debugging the test showed that the last schema contains all 
fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput. 
   In the original version of the test, field A was not added due to plan 
optimization - condition `1=0` was replaced by` limit 0`
   
   I can still provide a solution with combining schema if required.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906235#comment-16906235
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520813625
 
 
   Good questions, you can investigate how now union types are handled when 
there are data. Regarding who wins, maybe you can look into 
`org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch` to see how it 
creates combined schema using precedence rules.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906160#comment-16906160
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520819862
 
 
   > Good questions, you can investigate how now union types are handled. 
Regarding who wins, maybe you can look into 
`org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch` to see how it 
Cretes combined schema using precedence rules.
   
   @arina-ielchiieva , thanks for your advice. 
   I will try to make a combined scheme by the end of the week.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906140#comment-16906140
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520813625
 
 
   Good questions, you can investigate how now union types are handled when 
there are data. Regarding who wins, maybe you can look into 
`org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch` to see how it 
Cretes combined schema using precedence rules.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906139#comment-16906139
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520813625
 
 
   Good questions, you can investigate how now union types are handled. 
Regarding who wins, maybe you can look into 
`org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch` to see how it 
Cretes combined schema using precedence rules.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906134#comment-16906134
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520812579
 
 
   > Is there a way to write combined schema in this case?
   
   @arina-ielchiieva , thank you for your comment. 
   
   I can try to make a combined scheme, but:
   - What type of field should be written if the first BatchSchema contains 
field "A" with type "bigint", and the second - field "A" with type "varchar"? 
The last one wins?
   - What about union vectors?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906133#comment-16906133
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520812579
 
 
   > Please provide example...
   
   
   
   > Is there a way to write combined schema in this case?
   
   @arina-ielchiieva , thank you for your comment. 
   
   I can try to make a combined scheme, but:
   - What type of field should be written if the first BatchSchema contains 
field "A" with type "bigint", and the second - field "A" with type "varchar"? 
The last one wins?
   - What about union vectors?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906127#comment-16906127
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520807379
 
 
   Is there a way to write combined schema in this case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906126#comment-16906126
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520807005
 
 
   > Please provide example...
   
   I added a test TestParquetWriterEmptyFiles#testWriteEmptySchemaChange. As 
you can see, there is no "a" field in the written schema. 
   
   Probably, it would be correct to write schema for all empty scans, but it 
will lead to writing "garbage" empty parquet files, if the scan with data is at 
the end of batch.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906093#comment-16906093
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520795098
 
 
   > What behavior will be in this case? Failure? No-op?
   
   Remained unchanged (No-op)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906091#comment-16906091
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520795098
 
 
   > What behavior will be in this case? Failure? No-op?
   Remained unchanged (No-op)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906088#comment-16906088
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520794418
 
 
   > Ignores all schemas except last while writing empty parquet file
   Please provide example...
   > Not support empty schemas (e.g. create table .. as select * from 
empty.json, e.g. {})
   What behavior will be in this case? Failure? No-op?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906089#comment-16906089
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520794418
 
 
   > Ignores all schemas except last while writing empty parquet file
   
   Please provide example...
   > Not support empty schemas (e.g. create table .. as select * from 
empty.json, e.g. {})
   
   What behavior will be in this case? Failure? No-op?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906087#comment-16906087
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520794009
 
 
   Regarding this comment
   
   > Questions:
   TestParquetWriterEmptyFiles#testMultipleWriters now creates several empty 
files, but not fails, since reading of empty parquet is supported. Should I 
rewrite comment or remove the test?
   
   I guess you can remove these tests and add new tests into 
`org.apache.drill.exec.store.parquet.TestEmptyParquet`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906044#comment-16906044
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520776279
 
 
   > @oleg-zinovev support empty parquet files reading is already merged into 
master 
([4f4e1af](https://github.com/apache/drill/commit/4f4e1af53c9abccd1996f3b6841731e68768b48e)).
 Do you plan on working on adding support for writing empty parquet files? We 
plan to include it in next Drill release (end of August / beginning of 
September). If yes, please factor out writing empty parquet and update the PR.
   
   Yes, I think I’ll do it today.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906037#comment-16906037
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-520774175
 
 
   @oleg-zinovev support empty parquet files reading is already merged into 
master 
(https://github.com/apache/drill/commit/4f4e1af53c9abccd1996f3b6841731e68768b48e).
 Do you plan on working on adding support for writing empty parquet files? We 
plan to include it in next Drill release (end of August / beginning of 
September). If yes, please factor out writing empty parquet and update the PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899989#comment-16899989
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-518187400
 
 
   @oleg-zinovev thanks, I have assigned 
https://issues.apache.org/jira/browse/DRILL-7156 to you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899985#comment-16899985
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-518186190
 
 
   Ok, I'll try to rewrite the commit within a week.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899984#comment-16899984
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836#issuecomment-518183790
 
 
   @oleg-zinovev thanks for making the changes, though the situation is little 
bit awkward, since I was working on similar changes and did not know you intend 
to do them as well (https://issues.apache.org/jira/browse/DRILL-4517). Though I 
was working on reading empty parquet files but not writing them. I suggest you 
separate out writing empty parquet files into separate PR as for reading it 
might be better if my changes will be used instead: first you change metadata 
cache files and this would affect backward compatibility as well as will have 
to store more information than needed, secondly your changes does not seem to 
optimize reading complex types. What do you think?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899949#comment-16899949
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: empty parquet files support
URL: https://github.com/apache/drill/pull/1836#issuecomment-518161785
 
 
   @arina-ielchiieva, could you please review?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899942#comment-16899942
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: empty parquet files 
support
URL: https://github.com/apache/drill/pull/1836
 
 
   PR for Drill empty parquet  files read and write support.
   
   Known limitations:
   1) Not working for hive parquet for now
   2) Ignores all schemas except last while writing empty parquet file
   3) Not support empty schemas (e.g. create table `..` as select * from 
`empty.json, e.g. {}`)
   
   Short changes description:
   1) Parquet footer metadata added
   2) Parquet writer checks that at least 1 row has been written. If not - 
creates a empty parquet file with footer.
   3) EmptyParquetRowGroupScan and EmptyParquetScanBatchCreator added
   
   Questions:
   1) TestParquetWriterEmptyFiles#testMultipleWriters now creates several empty 
files, but not fails, since reading of empty parquet is supported. Should I 
rewrite comment or remove the test?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-06-05 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856605#comment-16856605
 ] 

Arina Ielchiieva commented on DRILL-7156:
-

[~le.louch] there are a similar issue in Drill:
https://issues.apache.org/jira/browse/DRILL-4517
https://issues.apache.org/jira/browse/DRILL-6885

If you can contribute the patch, it would be great.



> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-05-15 Thread Oleg Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840295#comment-16840295
 ] 

Oleg Zinoviev commented on DRILL-7156:
--

I had to deal with this problem. As a result, I made my own Apache Drill build, 
which supports the creation and reading of empty parquet files. I can try to 
prepare a patch in the main repository, if this is really a Drill problem.

> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)