date:20190814

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907603#comment-16907603
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual 
row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-521408383
 
 
   @agozhiy I've done the changes requested. Please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (DRILL-7348) Aggregate on Subquery with Select Distinct or UNION fails to Group By

2019-08-14 Thread Keith G Yu (JIRA)

Keith G Yu created DRILL-7348:
-

 Summary: Aggregate on Subquery with Select Distinct or UNION fails 
to Group By
 Key: DRILL-7348
 URL: https://issues.apache.org/jira/browse/DRILL-7348
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.15.0
Reporter: Keith G Yu


The following query fails to group properly.
{code:java}
SELECT date, COUNT(1)
FROM (
SELECT DISTINCT
id,
date,
status
FROM table(dfs.`path`(type => 'text', fieldDelimiter => ',', extractHeader 
=> TRUE))
)
GROUP BY 1{code}
This also fails to group properly.
{code:java}
SELECT date, COUNT(1)
FROM (
SELECT
id,
date,
status
FROM table(dfs.`path1`(type => 'text', fieldDelimiter => ',', extractHeader 
=> TRUE))
UNION
SELECT
id,
date,
status
FROM table(dfs.`path2`(type => 'text', fieldDelimiter => ',', extractHeader 
=> TRUE))
)
GROUP BY 1
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-08-14 Thread Arina Ielchiieva (JIRA)

Arina Ielchiieva created DRILL-7347:
---

 Summary: Upgrade Apache Iceberg to released version
 Key: DRILL-7347
 URL: https://issues.apache.org/jira/browse/DRILL-7347
 Project: Apache Drill
  Issue Type: Task
Reporter: Arina Ielchiieva


Currently Drill uses Apache Iceberg build on certain commit using JitPack since 
there is no official released version. Once Iceberg first version is released, 
we need to use officially released version instead of commit.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Closed] (DRILL-7214) Error While Strating Drill in distributed mode.

2019-08-14 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-7214.
---
Resolution: Invalid

> Error While Strating Drill in distributed mode.
> ---
>
> Key: DRILL-7214
> URL: https://issues.apache.org/jira/browse/DRILL-7214
> Project: Apache Drill
>  Issue Type: Task
>  Components: Client - Java
>Affects Versions: 1.15.0
> Environment: centos7
>Reporter: Abhay Kumar Singh
>Priority: Blocker
>  Labels: beginner
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: JDK Java compiler 
> not available. Ensure Drill is running with the java executable from a JDK 
> and not a JRE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7214) Error While Strating Drill in distributed mode.

2019-08-14 Thread Arina Ielchiieva (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907383#comment-16907383
 ] 

Arina Ielchiieva commented on DRILL-7214:
-

Closing Jira as invalid. Please re-open if issue persists.

> Error While Strating Drill in distributed mode.
> ---
>
> Key: DRILL-7214
> URL: https://issues.apache.org/jira/browse/DRILL-7214
> Project: Apache Drill
>  Issue Type: Task
>  Components: Client - Java
>Affects Versions: 1.15.0
> Environment: centos7
>Reporter: Abhay Kumar Singh
>Priority: Blocker
>  Labels: beginner
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: JDK Java compiler 
> not available. Ensure Drill is running with the java executable from a JDK 
> and not a JRE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907298#comment-16907298
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521263751
 
 
   @oleg-zinovev thanks for making the changes, a couple of minor comments are 
left...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907297#comment-16907297
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313897281
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,72 +18,142 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
-  @Test // see DRILL-2408
+  @Test
   public void testWriteEmptyFile() throws Exception {
 final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfile";
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
-Assert.assertFalse(outputFile.exists());
+assertTrue(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
+assertFalse(outputFile.exists());
+  }
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+  @Test
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
+
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
+
+// Make sure that only 1 parquet file was created
+assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
   }
 
-  @Test // see DRILL-2408
+  @Test
+  public void testSimpleEmptyFileSchema() throws Exception {
 
 Review comment:
   Also we need to add test where we select from non-empty Parquet file but 
filter condition eliminates all rows, similar as you have for JSON.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907295#comment-16907295
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313896738
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,72 +18,142 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
-  @Test // see DRILL-2408
+  @Test
   public void testWriteEmptyFile() throws Exception {
 final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfile";
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
-Assert.assertFalse(outputFile.exists());
+assertTrue(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
+assertFalse(outputFile.exists());
+  }
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+  @Test
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
+
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
+
+// Make sure that only 1 parquet file was created
+assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
   }
 
-  @Test // see DRILL-2408
+  @Test
+  public void testSimpleEmptyFileSchema() throws Exception {
 
 Review comment:
   This test is redundant since `testComplexEmptyFileSchema` both cases.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907293#comment-16907293
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313895815
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -122,6 +122,9 @@
   private PrimitiveTypeName logicalTypeForDecimals;
   private boolean usePrimitiveTypesForDecimals;
 
+  /** Whether no rows was written. */
 
 Review comment:
   Is used to ensure that empty Parquet file will be written if no rows were 
provided.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907195#comment-16907195
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313831543
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,23 +18,33 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   My bad. Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907190#comment-16907190
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313829664
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -18,23 +18,33 @@
 package org.apache.drill.exec.physical.impl.writer;
 
 import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchemaBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.categories.ParquetTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
-import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.io.File;
+import java.nio.file.Paths;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 @Category({ParquetTest.class, UnlikelyTest.class})
 public class TestParquetWriterEmptyFiles extends BaseTestQuery {
 
   @BeforeClass
   public static void initFs() throws Exception {
 updateTestCluster(3, null);
+dirTestWatcher.copyResourceToRoot(Paths.get("schemachange"));
+dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
   }
 
   @Test // see DRILL-2408
 
 Review comment:
   Please remove `see DRILL-2408`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907188#comment-16907188
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521208818
 
 
   @arina-ielchiieva , thanks for review. Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907184#comment-16907184
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite tests using RowSet. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907164#comment-16907164
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite tests using RowSet. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907163#comment-16907163
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on pull request #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313815186
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   If needed, I can also rewrite texts using RowSet. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Assigned] (DRILL-7326) "Unsupported Operation Exception" appears on attempting to create table in Drill from json, with double nested array

2019-08-14 Thread Igor Guzenko (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7326:
---

Assignee: Igor Guzenko

> "Unsupported Operation Exception" appears on attempting to create table in 
> Drill from json, with double nested array
> 
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907153#comment-16907153
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on issue #1836: DRILL-7156: Support empty Parquet 
files creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521191646
 
 
   > @arina-ielchiieva
   > Some time spent debugging the test showed that the last schema contains 
all fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput.
   > In the original version of the test, field A was not added due to plan 
optimization - condition `1=0` was replaced by` limit 0`
   > 
   > I can still provide a solution with combining schema if required.
   
   This case we don't need schema merge,
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907149#comment-16907149
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313806015
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -122,6 +122,8 @@
   private PrimitiveTypeName logicalTypeForDecimals;
   private boolean usePrimitiveTypesForDecimals;
 
+  private boolean empty = true;
 
 Review comment:
   Please add comment describing the purpose of this flag.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907150#comment-16907150
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313803238
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+
+// end_date column is null, so it missing in result schema.
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+.addNullable("employee_id", TypeProtos.MinorType.BIGINT)
+.addNullable("full_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("first_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("last_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("position_id", TypeProtos.MinorType.BIGINT)
+.addNullable("position_title", TypeProtos.MinorType.VARCHAR)
+.addNullable("store_id", TypeProtos.MinorType.BIGINT)
+.addNullable("department_id", TypeProtos.MinorType.BIGINT)
+.addNullable("birth_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("hire_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("salary", TypeProtos.MinorType.FLOAT8)
+.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT)
+.addNullable("education_level", TypeProtos.MinorType.VARCHAR)
+.addNullable("marital_status", TypeProtos.MinorType.VARCHAR)
+.addNullable("gender", TypeProtos.MinorType.VARCHAR)
+.addNullable("management_role", TypeProtos.MinorType.VARCHAR);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+.withSchemaBuilder(schemaBuilder)
+.build();
+
+testBuilder()
+.unOrdered()
+.sqlQuery("select * from dfs.tmp.%s", outputFileName)
+.schemaBaseLine(expectedSchema)
+.go();
   }
 
   @Test // see DRILL-2408
   public void

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907148#comment-16907148
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313803449
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+
+// end_date column is null, so it missing in result schema.
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+.addNullable("employee_id", TypeProtos.MinorType.BIGINT)
+.addNullable("full_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("first_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("last_name", TypeProtos.MinorType.VARCHAR)
+.addNullable("position_id", TypeProtos.MinorType.BIGINT)
+.addNullable("position_title", TypeProtos.MinorType.VARCHAR)
+.addNullable("store_id", TypeProtos.MinorType.BIGINT)
+.addNullable("department_id", TypeProtos.MinorType.BIGINT)
+.addNullable("birth_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("hire_date", TypeProtos.MinorType.VARCHAR)
+.addNullable("salary", TypeProtos.MinorType.FLOAT8)
+.addNullable("supervisor_id", TypeProtos.MinorType.BIGINT)
+.addNullable("education_level", TypeProtos.MinorType.VARCHAR)
+.addNullable("marital_status", TypeProtos.MinorType.VARCHAR)
+.addNullable("gender", TypeProtos.MinorType.VARCHAR)
+.addNullable("management_role", TypeProtos.MinorType.VARCHAR);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+.withSchemaBuilder(schemaBuilder)
+.build();
+
+testBuilder()
+.unOrdered()
+.sqlQuery("select * from dfs.tmp.%s", outputFileName)
+.schemaBaseLine(expectedSchema)
+.go();
   }
 
   @Test // see DRILL-2408
 
 Review comment:

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907147#comment-16907147
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800525
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -486,11 +467,54 @@ public void abort() throws IOException {
 
   @Override
   public void cleanup() throws IOException {
-flush();
+flush(true);
 
 codecFactory.release();
   }
 
+  private void createParquetFileWriter() throws IOException {
+assert parquetFileWriter == null;
+
+Path path = new Path(location, prefix + "_" + index + ".parquet");
+// to ensure that our writer was the first to create output file, we 
create empty file first and fail if file exists
+Path firstCreatedPath = storageStrategy.createFileAndApply(fs, path);
+
+// since parquet reader supports partitions, it means that several output 
files may be created
+// if this writer was the one to create table folder, we store only folder 
and delete it with its content in case of abort
+// if table location was created before, we store only files created by 
this writer and delete them in case of abort
+addCleanUpLocation(fs, firstCreatedPath);
+
+// since ParquetFileWriter will overwrite empty output file (append is not 
supported)
+// we need to re-apply file permission
+if (useSingleFSBlock) {
+  // Passing blockSize creates files with this blockSize instead of 
filesystem default blockSize.
+  // Currently, this is supported only by filesystems included in
+  // BLOCK_FS_SCHEMES (ParquetFileWriter.java in parquet-mr), which 
includes HDFS.
+  // For other filesystems, it uses default blockSize configured for the 
file system.
+  parquetFileWriter = new ParquetFileWriter(conf, schema, path, 
ParquetFileWriter.Mode.OVERWRITE, blockSize, 0);
+} else {
+  parquetFileWriter = new ParquetFileWriter(conf, schema, path, 
ParquetFileWriter.Mode.OVERWRITE);
+}
+storageStrategy.applyToFile(fs, path);
+parquetFileWriter.start();
+  }
+
+  private void flushParquetFileWriter() throws IOException {
+assert parquetFileWriter != null;
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907145#comment-16907145
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800825
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -310,27 +312,26 @@ public void checkForNewPartition(int index) {
 try {
   boolean newPartition = newPartition(index);
   if (newPartition) {
-flush();
+flush(false);
 newSchema();
   }
 } catch (Exception e) {
   throw new DrillRuntimeException(e);
 }
   }
 
-  private void flush() throws IOException {
+  private void flush(final boolean cleanUp) throws IOException {
 
 Review comment:
   ```suggestion
 private void flush(boolean cleanUp) throws IOException {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907151#comment-16907151
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313805760
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
+  }
+
+  @Test
+  public void testEmptyFileSchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testemptyfileschema";
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
 
 Review comment:
   Please replace select from JSON file with select from Parquet, we need to 
test that schema is created correctly from already known schema.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907144#comment-16907144
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313800742
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -486,11 +467,54 @@ public void abort() throws IOException {
 
   @Override
   public void cleanup() throws IOException {
-flush();
+flush(true);
 
 codecFactory.release();
   }
 
+  private void createParquetFileWriter() throws IOException {
+assert parquetFileWriter == null;
 
 Review comment:
   Please remove, there is no need for this check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907146#comment-16907146
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

arina-ielchiieva commented on pull request #1836: DRILL-7156: Support empty 
Parquet files creation
URL: https://github.com/apache/drill/pull/1836#discussion_r313804193
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriterEmptyFiles.java
 ##
 @@ -43,47 +49,99 @@ public void testWriteEmptyFile() throws Exception {
 final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
 test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 
1=0", outputFileName);
+Assert.assertTrue(outputFile.exists());
+  }
+
+  @Test
+  public void testWriteEmptyFileWithEmptySchema() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyfileemptyschema";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
+
+test("CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`empty.json`", 
outputFileName);
 Assert.assertFalse(outputFile.exists());
   }
 
   @Test
-  public void testMultipleWriters() throws Exception {
-final String outputFile = 
"testparquetwriteremptyfiles_testmultiplewriters";
+  public void testWriteEmptySchemaChange() throws Exception {
+final String outputFileName = 
"testparquetwriteremptyfiles_testwriteemptyschemachange";
+final File outputFile = 
FileUtils.getFile(dirTestWatcher.getDfsTestTmpDir(), outputFileName);
 
-runSQL("alter session set `planner.slice_target` = 1");
+test("CREATE TABLE dfs.tmp.%s AS select id, a, b from 
dfs.`schemachange/multi/*.json` WHERE id = 0", outputFileName);
 
-try {
-  final String query = "SELECT position_id FROM cp.`employee.json` WHERE 
position_id IN (15, 16) GROUP BY position_id";
+// Only the last scan scheme is written
+SchemaBuilder schemaBuilder = new SchemaBuilder()
+  .addNullable("id", TypeProtos.MinorType.BIGINT)
+  .addNullable("a", TypeProtos.MinorType.BIGINT)
+  .addNullable("b", TypeProtos.MinorType.BIT);
+BatchSchema expectedSchema = new BatchSchemaBuilder()
+  .withSchemaBuilder(schemaBuilder)
+  .build();
 
-  test("CREATE TABLE dfs.tmp.%s AS %s", outputFile, query);
+testBuilder()
+  .unOrdered()
+  .sqlQuery("select * from dfs.tmp.%s", outputFileName)
+  .schemaBaseLine(expectedSchema)
+  .go();
 
-  // this query will fail if an "empty" file was created
-  testBuilder()
-.unOrdered()
-.sqlQuery("SELECT * FROM dfs.tmp.%s", outputFile)
-.sqlBaselineQuery(query)
-.go();
-} finally {
-  runSQL("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_DEFAULT);
-}
+// Make sure that only 1 parquet file was created
+Assert.assertEquals(1, outputFile.list((dir, name) -> 
name.endsWith("parquet")).length);
 
 Review comment:
   Static import.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-08-14 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907085#comment-16907085
 ] 

ASF GitHub Bot commented on DRILL-7156:
---

oleg-zinovev commented on issue #1836: DRILL-7156: Support empty Parquet files 
creation
URL: https://github.com/apache/drill/pull/1836#issuecomment-521174404
 
 
   @arina-ielchiieva 
   Some time spent debugging the test showed that the last schema contains all 
fields. The field is added in ProjectRecordBatch#setupNewSchemaFromInput. 
   In the original version of the test, field A was not added due to plan 
optimization - condition `1=0` was replaced by` limit 0`
   
   I can still provide a solution with combining schema if required.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.16.0
>Reporter: Sayalee Bhanavase
>Assignee: Oleg Zinoviev
>Priority: Major
> Fix For: 1.17.0
>
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

[jira] [Created] (DRILL-7348) Aggregate on Subquery with Select Distinct or UNION fails to Group By

[jira] [Created] (DRILL-7347) Upgrade Apache Iceberg to released version

[jira] [Closed] (DRILL-7214) Error While Strating Drill in distributed mode.

[jira] [Commented] (DRILL-7214) Error While Strating Drill in distributed mode.

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Assigned] (DRILL-7326) "Unsupported Operation Exception" appears on attempting to create table in Drill from json, with double nested array

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

26 matches

Site Navigation

Mail list logo

Footer information